Table of Contents
Let’s talk about MongoDB Indexes and its associated topics. We all know that indexes can improve the efficiency of read operations. Scanning the entire collection for a single document is not good. For example: You have 1 million documents in a collection and you’re looking for a specific document with filter criteria – if you don’t have proper indexes supporting the queries, it will scan the entire collection and return the result. You have received the results, but your intention is to retrieve them in an optimal amount of time. Here we are talking about 1 million documents, but in the real-world scenario, it scales up to much higher number of documents.
With the help of indexes, we can avoid the collection of scans. Indexes are special data structures that store a small portion of the collection’s data set in an easy-to-traverse form. The index stores the value of a specific field or a set of fields, ordered by the value of the fields. The ordering of the index entries supports the efficient equality matches and range-based query operations. In addition, MongoDB can also return sorted results by using the ordering in the index. Let’s try to understand this with the help of some examples.
Default indexes
Created a new database with the name search and a collection with the name esr. We simply added a sample document {name:”Ashnik”} to esr collection, verified the data and indexes. Let’s check the default index {“_id”: 1}. We can’t delete ‘ _id ’ fields but can update the values. If we are not inserting any data into the’ _id ‘ field, MongoDB will generate a unique ObjectId.
We can see the ObjectId in the screenshot below.
Creating Index
The following example creates a single field ascending index on the name field. We can create a descending order by changing 1 to -1.
db.esr.createIndex({name: 1})
Now, we have 2 indexes. One on _id and another on name fields. The default name for an index is the concatenation of the indexed keys and each key’s direction in the index (i.e., 1 or -1) using underscores as a separator. For example, an index created on {item: 1, quantity: -1} has the name item_1_quantity_-1.
You can create indexes with a custom name, such as the one that is more human-readable than the default. For example, consider an application that frequently queries the product collection to populate data on an existing inventory. The following createIndex() method generates an index on item and quantity named query for inventory.
db.products.createIndex( |
{item: 1, quantity: –1}, |
{name: “query for inventory”} |
) |
Index types
MongoDB provides different index types to support the specific types of data and queries.
Let’s discuss the single field and compound indexes to understand more about index prefix, ESR, and the sort order.
Single Field Index
MongoDB provides a complete support for indexes on any type of field in a collection of documents. By default, all collections have an index on the _id field. We can add additional indexes to support our application queries.
Earlier we had created an ascending index on the name field of the esr collection. 1 defines an ascending order and -1 is the descending order.
db.esr.createIndex({name: 1})
Now, we have added location (embedded document) to the existing document. Look at the document below. Let’s see how we can create a single field index on the embedded documents.
The following operation creates an index on the location.city field
db.esr.createIndex({“location.city”: 1})
The location field is an embedded document, containing the fields city and the address. The following command creates an index on the location field.
db.esr.createIndex({location: 1})
Compound Index
MongoDB supports compound indexes, wherein a single index structure holds the references to multiple fields within a collection’s documents.
db.collection.createIndex ( { <field1>: <type>, <field2>: <type2>, … } )
Now, we have added another document to the esr collection which contains item, category, location and stock fields.
The following operation creates an ascending index on the item and stock fields
db.esr.createIndex({ “item”: 1, “stock”: 1 })
Below queries will use index ({ “item”: 1, “stock”: 1 }. Order of the fields is not important but the prefix is important.
db.esr.find( { item: “Apple” }) |
db.esr.find( { item: “Apple”, stock: { $lt: 5 }})
db.esr.find( { stock: { $lt: 5 }, item: “Apple”}) but not db.esr.find( { stock: { $lt: 5 }) |
In addition to supporting queries that match on all the index fields, compound indexes can also support queries that match on the prefix of the index fields. That is, the index supports queries on the item field as well as both item and stock fields. Index prefixes are the beginning subsets of indexed fields. For example, consider the following compound index.
{“item”: 1, “location”: 1, “stock”: 1}
The index has the following index prefixes.
- {item: 1}
- {item: 1, location: 1}
Following queries will use the Index
db.esr.find( { item: “Apple” })
db.esr.find( { item: “Apple”, location: “36 Robinson Road”}) |
db.esr.find( { item: “Apple”,location: “36 Robinson Road”, stock: { $lt: 5 }})
but not db.esr.find( { location: “36 Robinson Road”, stock: { $lt: 5 }}) |
Let’s try to get more index information using the collection stats command. Stats with indexDetails:true option provides more information like how many indexes, each index file size and where that index is stored and what is the file name.
Now, we clearly know that
- esr collection is stored in the file collection-0-8597612610530926898.wt
- _id index is stored in index-1-8597612610530926898.wt
- Name_1 index is stored in index-4-8597612610530926898.wt
For each collection or index, the WiredTiger storage engine will write an individual file that is the MDB catalog file containing the catalog of all different collections and indexes.
By default, the index size displays in bytes. You can specify the scale to show it in KB or MB or GBs. Now, we know how much space each index occupies. We can use this information when you think about configuration or cache changes.
Index Limitations
- A single collection can have no more than 64 indexes.
- There can be no more than 32 fields in a compound index.
- Multikey indexes cannot cover queries over array field(s).
- Geospatial indexes cannot cover a query.
- You cannot hide the _id index.
- You cannot use hint() on a hidden index.
E S R – Equality Sort Range
We work on a movie collection to understand more about the ESR rules. Let’s explore what is present by looking into the data.
Query
db.movies.find({‘type’:’movie’,’tomatoes.production’:’Universal Pictures’,’runtime’:{$gt:50}}).sort({year:-1}).explain(‘executionStats’)
Our query contains equality (‘type’:’movie’,’tomatoes.production’:’Universal Pictures’), sort ({year:-1}) and range (‘runtime’:{$gt:50}). We can find the execution stats by using explain. Let’s check the winningPlan details for now. The entire collection scan has happened, and it examined 23539 documents and 371 documents were returned. We did not create any Index as of now. Let’s create an Index and verify this again.
The compound Index on type, tomatoes.production, year and runtime fields, we can verify using the getIndexes method. Order of the fields while creating an Index will matter. Follow ESR rules and decide the order of the fields.
We run the same query and verify the execution stats again. Look at the stage: ‘IXSCAN’ . It uses an Index. We can find which index is used in the indexName. Look at totaldocsExamined and nReturned are showing 371. Instead of scanning the entire collection, it scans only Index and returns the required output. Working with indexes is always faster than a collection scan. Most of the time Indexes will fit into the memory. That will help us to gain better performance. By contrast, many number of Indexes will affect write performance. From time to time we must check our indexes and remove unused Indexes.
Now, we jumble the order of the fields Equality and Range. Let’s see if our index will be useful or not. The result is, it uses an Index. Check the execution stats to confirm.
Query
db.movies.find({‘runtime’:{$gt:50},’tomatoes.production’:’Universal Pictures’,’type’:’movie’}).sort({year:-1}).explain(‘executionStats’)
We added one more sort field ‘award.wins’:-1. Remember that we did not include this field in the index. Look at the execution stats. Now also IXSCAN has happened but we can see that extra SORT due to the ‘award.wins’ field.
Query
db.movies.find({‘type’:’movie’,’tomatoes.production’:’Universal Pictures’,’runtime’:{$gt:50}}).sort({year:-1,’award.wins’:-1}).explain(‘executionStats’)
Again ,we have created one more Index. Look at the available indexes below.
db.movies.createIndex({‘type’: 1, ‘tomatoes.production’: 1,’year’:-1,’award.wins’:-1,’runtime’:1})
Check the execution stats for the query below. It used our latest created Index. Check winning and rejected plans. Winning plan contains our recently created Index and there is no SORT stage.
Query
db.movies.find({‘type’:’movie’,’tomatoes.production’:’Universal Pictures’,’runtime’:{$gt:50}}).sort({year:-1,’award.wins’:-1}).explain(‘executionStats’)
Finally, change the order of the fields in the sort section and verify the execution stats. Winning plan is showing our 1st index.
Query
db.movies.find({‘type’:’movie’,’tomatoes.production’:’Universal Pictures’,’runtime’:{$gt:50}}).sort({‘award.wins’:-1,year:-1}).explain(‘executionStats’)
Here are some key points about fields order in the query:
- The order of the query predicates does not matter
- The order of the sort fields does matter
- The order of the index keys/prefixes within the compound index matter
The query uses the index as it follows the ESR rule. For compound indexes, this rule of thumb is helpful in deciding the order of fields in the index:
- First, add those fields against which equality queries are run.
- The next fields to be indexed should reflect the sort order of the query.
- The last fields represent the range of data to be accessed.
Hope this blog helped you get a better understanding of MongoDB Indexes.
If you need consultation, subscription, technical or managed services for MongoDB inquire with our experts. Stay updated on more open source database technologies like MongoDB, Postgres, Redis, ELK and more, right here.