MongoDB: Step-5: Indexes and Query Performance Optimizations
This covers the various options of indexes, how to evaluate the query, understand the query-plan and the usage of indexes for different business scenarios.
Introduction
MongoDB supports extensive feature indexes that enable rich querying. i.e. over and above the index on the primary key, it has the ability to index different columns as secondary indexes. This is often a limitation in many NoSql solutions. Also, multiKey indexes are allowed. i.e. the indexing can be done on more than one column. The sorting in the multi-column index is left ordered.
The indexes have the data sorted in a specific order. The indexes are not only used for matching but for sorting too. The mongo query planner decides what index and what plan to choose.
Types of Indexes
- BTree — normal indexes
- Partial index — special-purpose index for specific queries
- TTL index — for auto-expiry of documents
- Text indexes — for text search in long strings, like searching a document
- Geo indexes — for sorting by proximity to a certain point.
- Hashed indexes used for sharding. It allows key values to be hashed for even distribution of the keys. Used in sharding for uniform distribution of documents.
Query Planner — How mongo executes the query
When a query is fired, mongo’s query planner examines the structure of the query, the fields that are mentioned in the matching criteria, the sort part, etc. The query planner checks if there is already a plan for it. When it does not find a plan, it prepares several plans for the execution of the query using different combinations of the indexes that are present. It then runs a race for 100 ms. The plan that returns the document is chosen as the preferred plan while other plans are rejected.
The query plan examination is done under two situations
- The query structure is new, something that mongo has not seen before
- There are some new indexes added that the query planner may choose to examine for use.
Explain — Plan for the query execution
The find() method returns the cursor. But if you call the function to explain() on the cursor, the command will not run the query but instead will display the plan used by the query planner for executing the query. This command shows you the winning-plan and the rejected-plans used by the query planner.
Example — Let’s create some data
db.inventory.insertMany([
{ item: “journal”, qty: 25, tags: [“blank”, “red”], size: { h: 14, w: 21, uom: “cm” } },
{ item: “mat”, qty: 85, tags: [“gray”], size: { h: 27.9, w: 35.5, uom: “cm” } },
{ item: “mousepad”, qty: 25, tags: [“gel”, “blue”], size: { h: 19, w: 22.85, uom: “cm” } }
])
db.inventory.find({“item”:”mousepad”}).explain()
WinningPlan
The winning plan refers to the final plan decided by the query planner which will be used for the actual query execution at runtime.
RejectedPlans
The rejected plans refer to all the other plans that were tried by the query planner before it finalized on the winning plan
COLLSCAN
The “winningPlan” section mentions that there is a full collection span. A query that does a COLLSCAN is a bad performing query. This query actually checks every document in the collection to come back and so results in load for mongo and therefore keeps the server busy from doing other operations. Such a query will not be able to scale when the volume of data increases in the database.
Candidate queries for optimization
For a volume database, you should identify the long-running queries and frequently run queries and they should be checked for optimization. The mongo profiler will help you identify such queries for performance optimization.
Indexes
The indexes keep the data in sorted order. So when you query it is able to quickly transverse the index using its efficient search algorithm and gets to the document that matches the criteria. It now gets the location of the document using which it accesses the document and retrieves the data.
Syntax
db.<collection>.createIndex({keys}, options)
db.inventory.createIndex({“item”:1})
where,
item → The field on which the index is created
1 → The sort order of the index, Asc →1, Desc → -1
Observe — The numIndexesAfter has increased by 1.
List indexes on a collection
db.inventory.getIndexes()
Observe:
- You see the keys on which the index was created along with sortOrder
- The name of the index is “item_1”. By default, it is a combination of the field name and the sortOrder. Normally you would give a custom name to the index for ease of maintenance
Index Name Limitations
By default, the index name is a combination of field+sortOrder of the index.
The max limit for the index name is 128 bytes. This limitation is removed from 4.2 onwards. But giving a custom name to the index is often useful in the long term maintenance of the index.
This is typically not an issue but if you have a combination index using deeply nested fields, you can easily cross 128 bytes. It is therefore recommended to give your own custom index name when creating the index.
The index name is also used for referring it, like when dropping and index.
Create Index Option — name
db.inventory.createIndex({“item” : 1, “qty” : 1, “size.w” : 1 }, {name : ”CustomNameABC”})
Observe — The new index has now got the custom name that we have given “CustomNameABC”.
Dropping an Index
The index can be dropped using a dropIndex() command. The index on _id being mandatory cannot be dropped.
dropIndex(<index-name>
db.inventory.dropIndex(“item_1”)
Create Index Option: unique
db.inventory.createIndex({“item” : 1}, {name:”ItemAsc”,unique:true})
You can create a unique index by providing this keyword. This will also work like a DB level constraint. If there are duplicate values for the field that is indexed, the index creation will throw an error indicating duplicate keys and will fail. Also, two documents having a non-existing field are treated as duplicate null values and the unique-index will not be created.
db.inventory.getIndexes()
Observe the “unique” key present when describing this index. An exception is an index on the _id field where the unique keyword will not be seen in the description but nevertheless it is a unique index.
Create Index Option — background
By default, the index is created in the foreground. The indexes if created in the foreground would stop all the requests(read/write operations) coming to MongoDB.
For very large collections this is important as index creation can take a lot of time. So typically indexes are created in the background so that requests are serviced at the time of index building. Creating indexes in the background can lead to huge indexes as compared to indexes created in the foreground.
db.inventory.createIndex({“qty” : 1}, {name : ”Quantity_1”, background:true})
Index usage by the query planner
db.inventory.find({“item”:”xyz”}).explain()
The output of the command is long, we will focus on only the important aspects of the result.
Observe: This query is making use of the combination index CustomNameABC.
Observe: The query planner has rejected the plan that was created using the index “item_1”
db.inventory.find({“qty”:100}).explain()
Observe: Here, the query planner had only one good plan which is also the winning plan.
db.inventory.find({“qty”:100, “item”:”mousePad”}).explain()
Hint — Suggesting the Query Planner use a specific index
There are times when the query planner does not pick the correct index for the query. In such cases, we can use hint(<index-name>) on the cursor to recommend to mongo to use the index provided.
db.<collectionName>.find(<various parameters>).hint()
db.inventory.find({“qty”:100, “item”:”mousePad”}).hint(“CustomNameABC”).explain()
db.inventory.find({“qty”:100, “item”:”mousePad”}).hint(“Quantity_1”).explain()
Observe:
- The winning plan is using the suggested index — this time “Quantity_1” though the query is the same.
- The query had 2 fields. The index was only one field. Mongo 1st uses the index for filtering one field. For the returned documents it then applies the 2nd condition.
Additional fields for the compound index — Sort using the index
The sort also can use the index in Mongo. Sorting is a very costly operation, so it is recommended that for queries that have sorting requirements, we also have the sort field as fields in the index after the fields in the match criteria. For example: For a query that looks like
db.inventory.find({“qty”:100}).sort({“tag”:1}).explain()
The above query is more optimized if we have an index on “qty,tag” instead of just the field qty.
db.inventory.createIndex({“qty”:1, “tag”:1})
Additional fields for compound index: The project fields
If all the data that needs to be returned is within the index, then Mongo chooses not to go to the collection and thereby saving one search and retrieval. So if there are some frequent queries that require some minimal fields you design your indexes to use this feature.
Sparse Index
For fields that don’t exist, the regular b-tree index will save null for the field and will save the document as an entry in the index. For the fields in documents that are sparse (present rarely and mostly non-existent), the index is still huge with a lot of null documents.
The use of a sparse index can optimize the query turnaround time very much. The sparse index contains the documents only if the field on which sparse index is created exists (including null values). Depending on how sparsely the field is, the index size can be therefore very small.
Match criteria — With less index size and search is very optimized and this works very well for match criteria.
Sort pitfall — The devil is in the details. The sparse index can also be used by Mongo for the sort operation. And while it does so, it can remove the documents in the sort pipeline when it does not find the document in the sparse index.
db.scores.insertMany([
{ _id : 1, "userid" : "newbie" },
{ _id : 2, "userid" : "abby", "score" : 82 },
{ _id : 3, "userid" : "nina", "score" : 90 }
])db.scores.createIndex( { score: 1 } , { sparse: true } )db.scores.find().count()db.scores.find().sort( { score: -1 } ).hint( { score: 1 } ).count()
Recommendation — When using a sparse index be very watchful of what kind of sort queries can come to Mongo. And if you are not 100% sure of the kind of sort queries that can come or that level of optimization is not required or the sparse field is not sparse enough, avoid sparse index.
Partial Index
This is the trump card amongst all the full-featured indexes that Mongo offers. This index takes query criteria as input and this index is put into use only if a query comes with a matching query structure comes across.
Use — This can be used when your data in the same collection has exclusive datasets and the query typically comes within one such exclusive set and your query is very frequent.
Advantages — The size of the index reduces to have the data that only that matches the query criteria, thereby reducing the query set.
Example — You can achieve phenomenal optimization of the query using this index. By using an index, the query turnaround time could come down from an hour to potentially milliseconds.
All related links
https://medium.com/@sarada.sastri/mongodb-step-1-an-overview-791e4fe861aa
https://medium.com/@sarada.sastri/mongodb-step-2-setting-up-the-server-4b0fad446c3e
https://medium.com/@sarada.sastri/mongodb-step-3-mongo-client-commands-4366b5ef139f
https://medium.com/@sarada.sastri/mongodb-step-4-designing-the-schema-8c91ee947230
https://medium.com/@sarada.sastri/mongodb-step-5-indexes-and-query-performance-optimizations-ed1bf744315b