What Are the Components of a Sharded Cluster in MongoDB?


A sharded cluster in MongoDB is composed of 3 elements: shards, config servers, and mongos. These are the following.

Shards

Data from a sharded cluster is encompassed in a subset which is stored in a shard. All shards grouped together to maintain the data of the entire cluster. For deployment, you have to configure a shard as a replica set in order to achieve high availability and redundancy. Shards are used for maintenance and administrative operations.

Primary Shard

All the databases have their own primary shard. Primary shards reside in a sharded cluster’s database. They store all of those collections which are not sharded. Bear in mind, that there is no link between the primary shard and the primary replica member of the replica set, hence do not be confused by the similarity in their names.

A primary shard is chosen by the mongos during the generation of a new database. To choose a shard, mongos picks the one which contains minimum data.

It is possible to change the primary shard via a command, “movePrimary”. However, do not change your primary member so casually. The change of a primary shard is a time-consuming procedure. During the primary shard migration, you cannot use collections of that shard’s database. Likewise, cluster operations can be disrupted, and the extent of this disturbance is reliant on the data which is currently migrating.

To check a general view of the cluster in terms of sharding, you can use the sh.status method via the mongo shell. The results generated by this method also specify the primary shard of the database. Other useful information includes information about how the chunks are distributed among the shards.

Config Servers

The sharded cluster’s metadata is stored in the config servers. This metadata can include the organization of the components in the cluster along with the states of these components and data. Information about all the shards and their chunks are maintained in the config servers.

This type of data is then used for caching by the mongos instances after which it is used for routing associated with read and writes operations with respect to the appropriate shards. When new metadata is updated, then the cache is also updated by the mongos.

It must be noted that the configuration of “authentication” in MongoDB like internal authentication and RBAC (role-based access control) is also stored in these config servers. Additionally, MongoDB utilizes them for the management of distributed locks.

While it is possible for a single config server to be used with all the sharded clusters, such practice is not recommended. If you have multiple sharded clusters, then use a separate config server for all of them.

Config Servers and Read/Write Operations

Write Operations

If you are a MongoDB user, then you must be familiar with the admin and config databases in MongoDB. Both of these databases are maintained in the config servers. Collections associated with the authorization, authentication, and system collections are stored by the “admin” database. On the other hand, the metadata of the sharded cluster is stored in the “config” database.

When the metadata is modified like when a chunk is split or a chunk is migrated, then MongoDB directs write operations on the config DB. During these write operations, MongoDB uses “majority” for the write concern.

However, as a developer, you should refrain from writing to the config DB by yourself in the midst of maintenance or standard operations.

Read Operations

The admin database is used for read operations by the MongoDB. These reads are associated with authorization, authentication, and internal operations.

If the metadata is modified like when a chunk is being migrated or mongos is initiated, then read operations are processed to the config database by the MongoDB. MongoDB uses the “majority” read concern during these read operations. Moreover, it is not only the MongoB which reads from config servers. Shards also require them for read operations associated with the metadata of the chunks.

Mongos

The mongos instances in MongoDB are responsible for shard related write operations while they are also utilized for the routing of queries. For the sharded cluster, mongos is the only available interface which offers the application perspective. Keep this in mind that there is no direct communication between the shards and applications.

Mongos monitors the contents of a shard via metadata caching with the help of the config servers. The metadata provided by the config servers is then used by the MongoDB for routing operations between clients and applications with the instances of the mongod. Mongos instances lack a persistent state. This is helpful because it assists in the lowest possible consumption of the available resources.

Usually, mongos instances are run on a system which also houses the applications servers. However, if your situation demands, then you can also use them on any other dedicated resources like shards.

Routing and Results

While routing a query for a cluster, a mongos instance assesses all the shards through a list and identifies which shard needs the query. It then forms a cursor for all of these specific shards.

Afterward, the data in these shards is merged by the mongos instance which is then displayed in the result document. There are some modifiers for queries like sorting which maybe required to be executed on a shard. Subsequently, the retrieval of the results is carried out by the mongos. To manage the query modifiers, mongos does the following.

  • In the case of un-sorted query results, mongos applies a “round-robin” strategy for the generation of results from the shards.
  • In the scenario in which the result size is restricted because of the limit() method, then shards receive this information from the mongos. Afterward, they re-implement limit on the result and then send it to the client.

If the skip() method is used in the query, then like the previous case, it is not possible for mongos to forward the information. Instead, it searches the shards and fetches the unskipped results after which the specified skip limit is processed during the arrangement of the entire result.

What Are Replica Set Members in MongoDB?


Among the several database concepts out there, one of the most important ones is replication. Replication involves data copying between multiple systems so each user has the same type of information. This allows for data availability which in turn improves the performance of the application. Likewise, it can also help in backups, for instance when one of the systems is damaged by a cyberattack. In MongoDB, replication is achieved with the help of grouped mongod processes known as a replica set.

What Is a Replica Set?

A replica set is an amalgamation of multiple mongod processes which are combined together. They are the key to offering redundancy in MongoDB along with ensuring strong availability of data. Members in the replica set are classified into three categories.

  1. Primary member.
  2. Secondary member.
  3. Arbiter.

Primary Member

There can only be one primary member in a single replica set. It is a sort of “leader” among all the other members. The major difference between a primary member and other members is that a primary member gets write operations. This means that when MongoDB has to process writes, then it forwards them to the primary member. These writes are then recorded in the operation log (oplog) of the primary member. Afterward, the secondary members look up in the oplog and use that recorded data to implement changes in their stored data.

In the case of read operations, all the members of the replica set can process them but the read operation is first directed to the primary member due to default settings in the MongoDB.

All members in a replica set acknowledge their availability through a “heartbeat”. If the heartbeat is missing for the specified time limit (usually defined in seconds), then it means that the member is disconnected. In replica sets, there are some cases when the primary member gets unavailable due to multiple issues, for instance, the data center in which the primary member resides is disconnected due to the power outage.

Since replication cannot operate without a primary member, an “election” is held to appoint a primary member from the secondary members.

Secondary Member

A secondary member is dependent on the primary member to maintain its data. For replication, it uses the oplog of the primary member asynchronously. Unlike primary members, a replica set can have multiple secondary members—the more the merrier. However, as explained before, write operations are only conducted by the primary member and thus, a secondary member is not allowed to process them.

If a secondary member is allowed to vote, then it is eligible to trigger an election and take part as a candidate to become the new primary member.

A secondary member can also be customized to achieve the following.

  • Made ineligible to become the primary member. This is done so it can be utilized in a secondary data center where it is valuable as a standby option.
  • Block the contact of applications from the secondary member such that they cannot “read” through it. This can ensure that they work with special applications which necessitate detachment from the standard traffic.
  • You can modify a secondary member to act as a snapshot for historical purposes. This is a backup strategy which can aid as a contingency plan. For example, when a user deletes a database by mistake, then the snapshot can be utilized.

Arbiter

An arbiter is distinct due to the fact that it does not store its own data copy. Likewise, it is ineligible for the primary member position. So, why exactly are they used? Arbiters provide efficiency in the elections; they can vote once. They are added so they can vote and break the possibility of scenarios where it is a “tie” between members.

For example, there are 5 members in a replica set. If the primary member gets unavailable, then an election is held where two of the secondary members bag two votes each, so it is not possible for either of them to become the primary member. Therefore, to break such standoffs, arbiters are added so their vote can complete the election and select a primary member.

It is also possible to add a secondary member in the replica set to improve the election but a new secondary member is heavy as it stores and maintains data. Since arbiters are not constrained by such overheads; therefore they are the most cost-effective solution.

It is recommended, that arbiters should never reside in sites which are responsible to host both the primary and secondary members.

If the configuration option, authorization, is enabled in the arbiter, then it can also swap credentials with members of the replica set where authentication is implemented using the “keyfiles”. In such scenarios, MongoDB applies encryption on the entire authentication procedure.

The use of cryptographic techniques ensures that this authentication is secure from any exploitation. However, since arbiters are unable to store or maintain data, this means that the internal table—which has information of users/roles in authentication—can not be owned by the arbiters. Therefore, the localhost exception is used with the arbiter for authentication purposes.

Do note that in the recent MongoDB versions (from MongoDB 3.6 onwards), if you upgrade the replica set from an older version, then its priority value is changed from 1 to 0.

What Is the Ideal Replica Set?

There is a limit on the number of members in the replica set. At most, a replica set can contain 50 members. However, the number of voting members must not cross the 7-member limit. To configure voting and allow a member to vote, the members[n]. votes setting of a replica set must be 1.

Ideally, a replica set should have at least three members which can bear data: 1 primary member and 2 secondary members. It is also possible to use a three-member replica set where with one primary, secondary, and arbiter but at the expense of redundancy.

How to Work with Data Modeling in MongoDB with an example


Were you assigned to work on the MEAN Stack? Did your organization choose MongoDB for data storage? As a beginner in MongoDB, it is important to familiarize yourself with data modeling in MongoDB.

One of the major considerations for data modeling in MongoDB is to assess the DB engine’s performance, balance the requirements of the application, and think about the retrieval patterns. As a beginner in MongoDB, think about how your application works with queries and updates and processes data.

MongoDB Schema

The schema of MongoDB’s NoSQL is considerably different from the relational and SQL databases. In the latter, you have to design and specify the schema of a table before you can begin with the insert operations to populate the table. There is no such thing in MongoDB.

The collections used by MongoDB operate on a different strategy which means that it is not necessary that two documents (similar to rows in SQL databases) can have the same schema. As a result, it is not mandatory for a document in a collection to adhere to the same data types and fields.

If you must create fields in the document or you have to modify the current fields, then you have to update a new structure for the document.

How to Improve Data Modeling

During data modeling in MongoDB, you must analyze your data and queries and consider the following tips.

Capped Collections

Think about how your application is going to take advantage of the database. For instance, if your application is going to use too many insert operations, then you can benefit from the use of capped collections.

Manage Growing Documents

Write operations increase data exponentially like when an array is updated with new fields, the data increases quickly. It can be a good practice to keep a track of the growth of your documents for data modeling.

Assessing Atomicity

On the document level, the operations in MongoDB are strictly atomic. A single write operation can only change the contents of a single document. There are some write operations that can modify multiple documents, but behind the scenes, they only process one document at a time. Therefore, you have to ensure that you are able to factor in accurate atomic dependency according to your requirements.

The use of embedded data models is often recommended to utilize atomic operations. If you use references, then your application is forced in working with different read/write operations.

When to Use Sharding?

Sharding is an excellent option for horizontal scaling. It can be advantageous when you have to deploy datasets in a large amount of data and where there is a major need for read and write operations. Sharding helps in categorizing database collections. This helps in the efficient utilization for the documents of the collection.

To manage and distribute data in MongoDB, you are required to create a shared key. Shard key has to be carefully selected or it may have an adverse impact on the application’s performance. Likewise, the right shard key can be used to prevent a query’s isolation. Other advantages include a notable uplift in the write capacity. It is imperative that you take your time to select a field for the position of shard key.

When to Use Indexes?

The first option to improve the query performance in MongoDB is an index. To consider the use of an index, go through all your queries. Look for those fields which are repeated the most often. Make a list of these queries and use them for your indexes. As a result, you can begin to notice a considerable improvement in the performance of your queries. Bear in mind, that indexes consume space both in the RAM and hard disk. Hence, you have to keep these factors in mind while creating indexes or your strategy can backfire instead.

Limit the Number of Collections

Sometimes, it is necessary to use different collections based on application requirements. However, at times, you may have used two collections when it may have been possible to do your work through a single one. Since each collection comes with its own overhead, hence you cannot overuse them.

Optimize Small Document in Collections

In case, you have single or multiple collections where you can notice a large number of small documents, then you can achieve a greater degree of performance by using the embedded data model. To increase a small document into a larger one via roll-up, you can try to see if it is possible to design a logical relationship between them during grouping.

Each document in MongoDB comes up with its own overhead. The overhead for a single document may not be much but multiple documents can make matters worse for your application. To optimize your collections, you can use these techniques.

  • If you have worked with MongoDB then you would have noticed that it automatically creates an “_id” field for each document and adds a unique 12-byte “ObjectId”. MongoDB indexes the “_id” field. However, this indexing may waste storage. To optimize your collection, modify “_id” field’s value when you create a new document. What this does is that it helps to save a value which could have been assigned to another document. While there is no explicit restriction to use a value for the “_id” field but do not forget the fact that it is used as a primary key in MongoDB. Hence, your value must be unique.
  • MongoDB documents store the name of all fields which does not bode well for small documents. In small documents, a large portion of their size can be dictated due to the number of their fields. Therefore, what you can do is that make sure that you use small and relevant names for fields. For instance, if there is the following field.

father_name: “Martin”

Then you can change it by writing the following.

f_name : “Martin”

At first glance, you may have only reduced 5 letters but for the application, you have decreased a significant number of bytes which are used to represent the field. When you will do the same for all queries then it can make a noticeable difference.

Let’s see how this works with an example 

The major choice which you have to think around while working for data modeling in MongoDB is what does your DOCUMENT structure entail? You have to decide what the relationships which represent your data are like whether you are dealing with a one-to-one relationship or a one-to-many relationship. In order to do this, you have two strategies.

Embedded Data Models

MongoDB allows you to “embed” similar data in the same document. The alternative term for embedded data models is de-normalized models. Consider the following format for embedded data models. In this example, the document begins with two standard fields “_id” and “name” to represent the details of a user. However, for contact information and residence, one field is not enough to store data as there are more than one attributes for data. Therefore, you can use the embedded data model to add a document within a document. For “contact_information” curly brackets are used to change the field to a document and add “email_address” and “office_phone”. A similar technique is applied to the residence of the user.

{

_id: “abc”,

name: “Bruce Wayne”,

contact_information: {

email_address: “bruce@ittechbook.com”,

office_phone: ” (123) 456–7890″,

},

residence: {

country: “USA”,

city: “Bothell”,

}

 

}

The strategy to store similar types of data in the same document is useful for a reason; they limit the number of queries to execute standard operations. However, the questions which might be going around in your mind is, when is the correct time to use an embedded data model? There are two scenarios which can be marked as appropriate for the use of embedded data models.

Firstly, whenever you find a one-to-many relationship in an entity, then it would be wise to infuse embedding. Secondly, whenever you identify an entity with “contains” relationship, then it is also a good time to use this data model. In embedded documents, you can access information by using the dot notation.

One-to-Many Relationship with Embedding

One-to-many is a common type of database relationship where a collection A can match multiple documents in collection B but the latter can only match for one document in A.

Consider the following example to see when de-normalized data models hold an advantage over normalized data models. To select one, you have to consider the growth of documents, access frequency, and similar factors.

In this example, reference is using three examples.

{

_id: “barney”,

name: “Barney Peters”

}

{

user_id: “Barney”,

street: “824 XYZ”,

city: “Baltimore”,

state: “MD”,

zip: “xxxxx”

}

{

user_id: “Barney”,

street: “454 XYZ”,

city: “Boston”,

state: “MA”,

zip: “xxxx”

}

If you can see closely, then it is obvious that the application has to continuously retrieve data for address along with other field names which are not required. As a result, several queries are wasted.

On the other hand, embedding your address field can ensure that the efficiency of your application is boosted and your application will only require a single query.

{

_id: “barney”,

name: “Barney Peters”,

addresses: [

{

user_id: “barney”,

street: “763 UIO Street”,

city: “Baltimore”,

state: “MD”,

zip: “xxxxx”

},

{

user_id: “barney”,

street: “102 JK Street”,

city: “Boston”,

state: “MA”,

zip: “xxxxx”

}

]

}

Normalized Data models (References)

In normalized data models, you can utilize references to represent relationship between multiple documents. To understand references, consider the following diagram.

References or normalized data models are used in the cases of one-to-many relationship models and many-to-many relationship models. In some instances of embedded documents model, we might have to repeat some data which could be avoided by using references. For example, see the following example.

mangos

In this example, the “_id” field in the user document references to two other documents that are required to use the same field.

References are suitable for datasets which are based on hierarchy. They can be used to describe multiple many-to-many relationships.

While references provide a greater level of flexibility in comparison to embedding, they also require the applications to issue the suitable follow-up queries for their resolution. Put simply, references can increase the processing between the client-side application and the server.

One-to-Many Relationship with References

There are some cases in which a one-to-many relationship is better off with references rather than embedding. In these scenarios, embedding can cause needless repetition.

{

title: “MongoDB Guide”,

author: [ “Mike Jones”, “Robert Johnson” ],

published_date: ISODate(“2019-1-1”),

pages: 1,200,

language: “English”,

publisher: {

name: “ASD Publishers”,

founded: 2009,

location: “San Francisco”

}

}

 

{

title: “Java for Beginners”,

author: “Randall James”,

published_date: ISODate(“2019-01-02”),

pages: 800,

language: “English”,

publisher: {

name: “BNM Publishers”,

founded: 2005,

location: “San Francisco”

}

}

As you can realize whenever a query requires the information of a publisher, that information is repeated continuously. By leveraging references, you can improve your performance by storing the information of the publisher in a different collection. In cases, where a single book has limited publishers and the likelihood of their growth is low, references can make a good impact. For instance,

{

name: “ASD Publishers”,

founded: 2009,

location: “San Francisco”

books: [101, 102, …]

}

 

{

_id: 101

title: “MongoDB Guide”,

author: [ “Mike Jones”, “Robert Johnson” ],

published_date: ISODate(“2019-1-1”),

pages: 1,200,

language: “English }

{

_id: 102

title: “Java for Beginners”,

author: “Randall James”,

published_date: ISODate(“2019-01-02”),

pages: 800,

language: “English”,

 

}

What Are Text Indexes in MongoDB?


MongoDB offers text indexes for search queries which include strings in their contents. At most, a collection in MongoDB can have no more than one text index. So the question is: how to build a text index?

Like other indexes, a text index can also be created using the db.collection.createIndex() method. Such an index can be built on a string array too. To define a text index for a field, you have to type “text” like the following instance.

db.employees.createIndex( {name:”text”} )

In this example, a text index has been built on the “name” field. Similarly, other fields can also be defined by using the same index.

Weights

While working with text indexes, you must become familiar with the concept of weight. Weight refers to an indexed field and marks its importance in comparison to other fields (which are also indexed) by processing the score for text search.

In all the indexed fields of a document, the match number is multiplied with the weight and the output is summed. MongoDB then takes the sum value and processes it to generate the document’s score.

By default, each indexed field carries a weight of 1. Weights can be modified in the db.collection.createIndex() method.

Wildcard Specifier

In MongoDB, there is also a wildcard specifier ($**). When this specifier is used in conjunction with a text index then it is referred to as a wildcard text index. What this does is that it applies indexing on every field which stores data in the form of strings for all the collection’s document. A wildcard specifier can be defined by using the following method.

db.collection.createIndex( {“S**”: “text”} )

Basically, wildcard text indexes can be seen as text indexes which work on more than a single field. To govern the query results’ ranking, weights can be specified for certain fields while building text indexes.

Case Insensitivity

The 3rd (latest) version of the text index offers support for the simple s and common c. The special T case folding found in Turkish is also supported.

 

The case insensitivity of the text index is further improved with support for diacritic insensitivity (a mark which represents different pronunciation) like É and é. This means that the text index does not differentiate between e, E, é, and É.

Tokenization

For purposes related to tokenization, the text index version 3 supports the following delimiters.

  • Hyphen
  • Dash
  • Quotation_Mark
  • White_Space
  • Terminal_Punctuation
  • Dash

For instance, if a text index finds the following text string, then it would treat spaces and “« »” as delimiters.

«Messi est l’un des plus grands footballeurs de tous les temps»

Sparse

By default, text indexes are “sparse” and therefore they do not need to be explicitly defined with the sparse option. When a document does not have a field indexed with a text index, then MongoDB does not make the document’s entry with the text index. When insertion occurs, then MongoDB does insert the document, however no addition occurs with the text index.

Limitations

Earlier, we talked about how there can be no more one text index for a collection. There are some more limitations for text indexes.

  • When a query entails the $text operator, then it is not possible to use the hint() method.
  • It is not possible for sort operations to utilize the text index’s arrangement or ordering, even if it is a compound text index.
  • It is possible to add a text index key to generate a compound index. Though, they have some limitations. For example, it is not possible to use special index types like geospatial index with a compound text index.

Moreover, while building a compound text index, all text index keys have to be defined adjacently. This specifying of index must come in the index specification document.

Lastly, if there are keys which precede the key of text index, then for executing a search with $text operation, it is necessary for the query predicate to use quality match conditions for the preceding keys.

  • For dropping a text index, it is mandatory to mention the index name in the method of db.collection.dropIndex(). In case you do not know the index name, the db.collection.getIndexes() method can be used.
  • There is no support for collation while working with text indexes. However, they do provide support for simple binary comparison.

Performance and Storage Constraints

While using text indexes, it is necessary to realize their impact on the performance and storage of your application

  • The creation of a text index is not too dissimilar to creating a huge multikey index. A text index takes considerably more time in comparison to a basic ordered index.
  • Considering the nature of applications, it is possible for text indexes to be “enormous”. They carry a single entry for an index in correspondence with every unique or special post-stemmed word for all the indexed fields whenever documents are inserted.
  • Text indexes do not save information or phrases related to a word’s proximity within the document. Therefore, queries with phrases run better in comparison if the complete collection is fitted into the RAM.
  • Text indexes have an effect on insertion operations in MongoDB. This is because MongoDB has to include entries for index with every post-stemmed string in indexed fields related to every newly-created source document.
  • While creating a big text index for a collection which exists for some time, make sure that you have a strong limit for open file descriptions.

$text Operator

The $text operator is used to conduct a textual search for a field’s contents which is text-indexed. An expression with the text operator has the following components.

  • $search – A single or multiple terms which is used by MongoDB for parsing and querying text indexes.
  • $language – An optional component which represents the language that is to be used for tokenizer, stemmer, and stop-words.
  • $caseSensitive – An optional component which is used to turn on or off the case sensitive search.
  • $diacriticSensitive – An optional component which is used to turn on or off the diacritic sensitive search.

An Introduction to Multikey Indexes with Examples


After understanding single-field and compound indexes, now is the time to learn about multikey indexes. When a field has an array value, an index key is generated for each of its array elements. These indexes are referred as multikey indexes. They can be used with arrays that have scalar values as well as those with nested documents.

To generate a multikey index, you have to use the standard db.collection.createIndex() method. Such indexes are automatically generated by MongoDB whenever it senses an indexed field specified as an array. Hence, there is no need for explicit definition of a multikey index.

Examples

While working with a standard array, let’s suppose we have a collection “student”.

{ _id: 21, name: “Adam”, marks: [ 80, 50, 90 ] }

To build an index on the “marks” field, write the following query.

db.student.createIndex( { marks: 1 } )

As the marks field is an array, thus this index is an example of a multikey index. All of the keys (80, 50, and 90) in its elements point to the same document.

To create a multikey index for array fields having embedded documents, let’s suppose we have a collection “products”.

{

_id: 5,

name: “tshirt”,

details: [

{ size: “large”, type: “polo”, stock:50 },

{ size: “small”, type: “crew neck”, stock:40 },

{ size: “medium”, type: “v neck”, stock: 60 }

]

}

{

_id: 6,

name: “pants”,

details: [

{ size: “large”, type: “cargo”, stock: 35 },

{ size: “small”, type: “jeans”, stock: 65 },

{ size: “medium”, type: “harem”, stock: 10 },

{ size: “large”, type: “cotton”, stock: 10 }

]

}

{

_id: 7,

name: “jacket”,

details: [

{ size: “large”, type: “bomber”, stock: 35 },

{ size: “medium”, type: “leather”, stock: 25 },

{ size: “medium”, type: “parka”, stock: 45 }

]

}

We can build a multikey index with the details.size and details.stock fields.

db.products.createIndex( { “details.size”: 1, “details.stock”: 1 } )

This index is now good to go against queries which have only a single field of “details.size” as well as queries having the both of the indexed fields. As such, these types of queries would benefit from the index.

db.products.find( { “details.size”: “medium” } )

db.products.find( { “details.size”: “small”, “details.stock”: { $gt: 10 } } )

Bounds in Multikey Index

Bounds represent the limits of an index .i.e. how much it needs to scan for searching a query’s results. If there are more than a single predicate with an index, then MongoDB integrates them through compounding or intersection. So what do we mean by intersection and compounding?

Intersection

Bounds intersection point towards the presence of “AND” (logical conjunction) for bounds. For example, if there are two bounds [ 4, Infinity]  and [ – Infinity, 8 ], then intersection bounds process [[4, 8 ]] When $elemMatch operation is used, then MongoDB applies intersection on multikey index bounds.

 

What Is $elemMatch?

Before moving forward, let’s understand the use of $elemMatch first.

$elemMatch is used for matching documents in array field where at the bare minimum, atleast one of the element is matched with the query. Bear in mind, that the operator does not work with $text and $where operators. For a basic example, consider a “student” collection.

{ _id: 8, marks: [ 72, 75, 78 ] }

{ _id: 9, marks: [ 65, 78, 79 ] }

The following query only processes a match with documents in which the “marks” array has at least a single element which is less than 75 and greater or equal to 70.

db.student.find(

{ marks: { $elemMatch: { $gte: 70, $lt: 75 } } }

In response, the result set is comprised of the following output.

{ “_id” : 8, “marks” : [72, 75, 78 ] }

Despite the fact that both 75 and 78 do not conform to the conditions but because 72 had a matched, hence the $elemMatch selected it.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For instance, we have a collection student which has a field “name” and an array field “marks”.

{ _id: 4, name: “ABC”, marks: [ 1, 10 ] }

{ _id: 5, name: “XYZ”, marks: [ 5, 4 ] }

To build a multikey index with the “marks” array.

db.student.createIndex( { marks: 1 } )

Now, the following query makes use of $elemMatch which means that the array must have at least one element which fulfils the condition of both predicates.

db.student.find( { marks : { $elemMatch: { $gte: 4, $lte: 8 } } } )

Computing the predicates one by one:

  • For the first predicate, the bounds are equal to or greater than 4 [ [ 4, Infinity ] ].
  • For the second predicate, the bounds are equal to or less than 8 [ [ – Infinity, 8 ] ].

Since $elemMatch is used here, therefore MongoDB can apply intersection on the bounds and integrate it like the following.

marks: [ [ 4, 8 ] ]

On the other hand, if the $elemMatch is not used, then MongoDB applies intersection on the multikey index bounds. For instance, check the following query.

db.student.find( { marks : { $gte: 4, $lte: 8 } } )

The query processes the marks array for atleast a single element which is equal to or greater than 4 “AND” atleast a single element which is equal to or less than 8. However, it is not necessary for a single element to conform to the requirements of both predicates, hence MongoDB does not apply intersection on the bounds and uses either [[4, Infinity]] or [[-Infinity, 8]].

Compounding

Compounding bounds means the use of bounds with compound index. For example, if there a compound index { x: 1, y:1 } which has a bound on the x field  [[4, Infinity] ] and a bound on the field y [[-Infinity, 8]]. By applying compounding,

{ x: [ [ 4, Infinity ] ], y: [ [ -Infinity, 8 ] ] }

Sometimes, MongoDB is unable to apply compounding on the given bounds. For such scenarios, it uses the bound on the leading field which in our example is x: [ [4, Infinity] ].

Suppose in our example indexing is applied on multiple fields, where one of the fields is an array. For instance, we have the collection student which stores the “name” and “marks” field.

{ _id: 10, name: “Adam”, marks: [ 1, 10 ] }

{ _id: 11, name: “William”, marks: [ 5, 4 ] }

Build a compound index with the “name” and the “marks” field.

db.student.createIndex( { name: 1, marks: 1 } )

In the following query, there is a condition which applies on both of the indexed keys.

db.student.find( { name: “William”, marks: { $gte: 4 } } )

Computing these predicates step-by-step.

  • For the “name” field, the bounds for the “William” predicate are the following [ [ “William”, “William” ] ].
  • For the “marks” field, the bounds for { $gte: 4 } predicate are [ [ 4, Infinity ] ].

MongoDB can apply compounding on both of these bounds.

{ name: [ [ “William”, “William” ] ], marks: [ [ 4, Infinity ] ] }

 

 

 

 

 

 

 

What Are Indexes in MongoDB?


What Are Indexes in MongoDB?

When a query is run in MongoDB, the program initiates a collection scan. All the documents which are stored in a collection have to be scanned so that only the appropriate documents can be matched. Obviously, this is a highly wasteful tactic as checking each document results in inefficient utilization of resources.

To address this issue, there is a certain feature in MongoDB known as indexes. Indexes perform as a filter so the scanning pool can be shortened and queries can be executed more “efficiently”. Indexes can be categorized as a “special” type of data structures.

Indexes save parts of a collection’s information (data). What they do is that they save a single or multiple fields’ value. The processing of an index’s content is done order-wise.

By default, MongoDB generates an index for the _id field whenever a collection is built. This index is unique. Due to the presence of this index, it is not possible to insert multiple documents which carry the exact same _id field value. Moreover, unlike other indexes, this index is un-droppable.

How to Create an Index?

Open your Mongo Shell and employ the method “db.collection.createIndex()” for the generation of an index. For the complete format, consider the following.

db.collection.createIndex( <key and index type specification>, <options> )

To develop our own index, let’s suppose we have a field for employee name as “ename”. We can generate an index on it.

db.employee.createIndex( { ename: -1 } )

Types

Indexes are classified in the following categories.

  • Single Field
  • Compound Index
  • Multikey
  • Text Indexes
  • 2dsphere Indexes
  • geoHaystack Indexes

 

Single Field Indexes

A single-field index is the simplest index of all. As the name suggests, it applies indexing on a single field. We begin our single-field example with a collection “student”. Now, this student collection carries documents like this:

{

“_id”: ObjectId(“681b13b5bc3446894d86cd342”),

“name”: Adam,

“marks”: 400,

“address”: { state: “TX”, city: “Fort Worth” }

}

To generate an index on the “marks” field, we can write the following query.

db.student.createIndex( { marks: 1 } )

We have now successfully generated an index which operates via an ascending order. This order is marked by the value of an index. With “1” as a value, you can define an index which arranges its contents by using the ascending order. On the other hand, a “-1” value defined an index by using the descending order. This index can now work with other queries that involve the use of “marks”. Some of their examples are:

db.student.find( { marks: 2 } )

db.student.find( { marks: { $gt: 5 } } )

In the second query, you might have noticed “$gt”. $gt is a MongoDB operator which translates to “greater than”. Similar operators are $gte (greater than and equal to), $lt (less than), and $lte (less than and equal to). In our upcoming examples, we are going to use these operators heavily. These are used for filtering out documents by specifying limits.

It is possible to apply indexing on the embedded documents too. This indexing requires the use of dot notation for the embedded documents. Continuing our “student” example,

{

“_id”: ObjectId(“681b13b5bc3446894d86cd342”),

“marks”: 500,

“address”: { state: “Virginia”, city: “Fairfax” }

}

We can apply indexing on the address.state field.

db.student.createIndex( { “address.state”: 1 } )

Whenever queries involving “address.state” are employed by the users, this index would support them. For instance,

db.student.find( { “address.state”: “FL” } )

db.student.find( { “address.city”: “Chicago”, “address.state”: “IL” } )

Likewise, it is possible to build indexes on the complete embedded document.

Suppose you have a collection “users” which contains the following data.

{

“_id”: ObjectId(“681d15b5be344699d86cd567”),

“gender”: “male”,

“education”: { high school: “ABC School”, college: “XYZ University” }

}

If you are familiar with MongoDB, then you know that “education” field is what we call an “embedded document”. This document contains two fields: high school and college. Now to apply indexing on the complete document, we can write the following.

db.users.createIndex( { education: 1 } )

This index can be used by queries like the following.

db.users.find( { education: { college: “XYZ University”, high_school: “ABC School” } } )

Compound Indexes

So far, we have only used a single field for indexing. However, MongoDB also supports the usage of multiple fields in an index. Such indexes are referred to as compound indexes. Bear in mind that there can be no more than 32 fields in compound indexes. To generate such an index, you have to follow this format where ‘f’ refers to the field name and ‘t’ refers to the index type.

db.collection.createIndex( { <f1>: <t1>, <f2>: <t2>, … } )

Suppose we have a collection “items” which stores these details.

{

“_id”: ObjectId(…),

“name”: “mouse”,

“category”: [“computer”, “hardware”],

“address”: “3rd Street Store”,

“quantity”: 80,

}

Compound index can be now applied on the “name” and “quantity” fields.

db.items.createIndex( { “name”: 1, “quantity”: 1 } )

Bear in mind that the order of fields in a compound index is crucial. The index will process by first referencing to the documents which are sorted according to the “name” field. Afterward, it will process the “quantity” field with the values of the sorted “name”.

Compound indexes are not only useful in supporting queries, which equal the index fields, but they also work with matched queries for the index field’s prefix. This means that the index works with queries that have only the “name” field as well as those that have the “quantity” field. For instance,

db.items.find( { name: “mouse” } )

db.items.find( { name: “mouse”, quantity: { $gt: 10 } } )

So far we have been using descending and ascending order with queries. Now, there is no issue in running it with single-field indexes but for the compound indexes, you have to analyze if your queries will work or not. For example, we have a collection “records” which stores documents having the fields “date” and “item”. When queries are used with this collection, then firstly, results are generated by arranging “item” in ascending order, and then a descending order is applied on the “date” values. For instance,

db.records.find().sort( { item: 1, date: -1 } )

Queries where we apply a descending order on the “item” and an ascending order on the “date” value work like:

db.records.find().sort( { item: -1, date: 1 } )

These sort operations can perfectly work with the queries like these:

db.records.createIndex( { “item” : 1, “date” : -1 } )

However, the point to note is that you cannot apply ascending order on both fields like the following.

db.records.find().sort( {“item”: 1, date: 1 } )

 

 

 

 

Design Patterns for Functional Programming


In software circles, a design pattern is a methodology and documented approach to a problem and its solution which is bound to be found repeatedly in several projects as a tumbling block. Software engineers customize these patterns according to their problem and form a solution for their respective applications. Patterns follow a formal structure to explain a problem and then go over a proposed answer as well as key points which are related to either the problem or the solution. A good pattern is one which is well known in the industry and used by the IT masses. For functional programming, there are several popular design patterns. Let’s go over some of these.

Monad

Monad is a design pattern which takes several functions and integrates them as a single function. It can be seen as a type of combinatory and is a core component of functional programming. In monad, a value is wrapped in a box which is then unwrapped and a function is passed to use the wrapped value.

To go into more technicalities, a monad can be classified into running on three basic principles.

·         A parameterized type M<T>

According to this rule, T can possess any type like String, Integer, hence it is optional.

·         A unit function T -> M<T>

According to this rule, there can be a function taking a type and its processing may return “Optional”. For instance, Optional.of(String) returns Optional<String>.

·         A bind operation: M<T> bind T -> M <U> = M<U>

According to this rule which is also known as showed operator due to the symbol >>==. For the monad, the bind operator is called. For instance, Optional<Integer>. Now this takes a lambda or function as an argument for instance like (Integer -> Optional<String> and returns and processes a Monad which has a different type.

Persistent Data Structures

In computer science, there is a concept known as a persistent data structure. Persistent data structure at their essence work like normal data structure but they preserve their older versions after modification. This means that these data structures are inherently immutable because apparently, the operations performed in such structures do not modify the structure in place. Persistent data structures are divided into three types:

  • When all the versions of a data structure can be accessed and only the latest version can be changed, then it is a partially persistent data structure.
  • When all the versions of a data structure can be accessed as well as changed, then it is a fully persistent data structure.
  • Sometimes due to a merge operation, a new version can be generated from two prior versions; such type of data structure is known as confluently persistent.

For data structure which does not show any persistence, the term “ephemeral” is used.

As you may have figured out by now, since persistent data structures enforce immutability, they are used heavily in functional programming. You can find persistent data structure implementations in all major functional programming language. For instance, in JavaScript Immutable.js is a library which is used for implementing persistent data structures. For example,

import { MapD } from ‘immutable’;

let employee = Map({

employeeName: ‘Brad’,

age: 27

});

employee.employeeName; // -> undefined

employee.get(’employeeName’); // -> ‘Brad’

Functors

In programming, containers are used to store data without assigning any method or properties to them. We just put a value inside a container which is then passed with the help of functional programming. A container only has to safely store the value and provide it to the developer in need. However, the values inside them cannot be modified. In functional programming, these containers provide a good advantage because they help with forming the foundation of functional construct and assist with asynchronous actions and pure functional error handling.

So why are we talking about containers? Because functors are a unique type of container. Functors are those containers which are coded with “map” function.

Among the simplest type of containers, we have arrays. Let’s see the following line in JavaScript.

const a1 = [10, 20,30, 40, 50];

Now to see a value of it, we can write.

Const x=y[1];

In functional, the array cannot be changed like.

a1.push(90)

However, new arrays can be created from an existing array. An array is theoretically a function. Technically, whenever a unary function is mapped with a container, then it is a functor. Here ‘mapped’ means that the container is used with a special function which is then applied to a unary function. For arrays, the map function is the special function. A map function processes the contents of an array and performs a special function for all the elements of the element step-by-step after which it responds with another array.

Zipper

A zipper is a design pattern which is used for the representation of an aggregate data structure. Such a pattern is good for codes where arbitrarily traversal is common and the contents can be modified, therefore it is usually used in purely functional programming environments. The concept of Zipper dates back to 1997 where Gérard Huet introduced a “gap buffer” strategy.

Zipper is a general concept and can be customized according to data structures like trees and lists. It is especially convenient for data structures which used recursion. When used with zipper, these data structure are known as “a list with zipper” or “a tree with zipper” for making it apparent that their implementation makes use of zipper pattern.

In simple terms, zipper with data structure has a hole. They are used for the manipulation and traversal in data structures where the hole indicates the present focus for the traversal. Zipper facilitates developers to easily move within the data structure.