Introduction to Rest with Examples – Part 2


In the previous post, we talked about what is REST APIs and discussed a few examples, we particularly, used CURL for our requests. So far, we have established that a request is composed of four parts: endpoint, method, header, and data. We have already explained endpoint and method, now let’s go over the header, data, and some more relevant information on the subject.

Headers

Headers offer information to the server and the client. They are used for a wide range of use cases, such as offering a peek into the body content or for authentication. Typically, HTTP headers follow the property-value pair format; a colon separates them. For instance, the following example consists of a header which informs the server about expecting JSON-based content.

“Content-Type: application/json”. Missing the opening”

By using cURL (we talked about it in the last post), you can use the –header option for sending the HTTP headers. For instance, if you want to send the above-mentioned header, then for the Github API, you can write the following.

curl -H “Content-Type: application/json” https://api.github.com

In order to check all of your sent headers, you can use the –verbose or the –v option at the end of the request. Consider the following command as an example.

Keep in mind that in your result, “*” indicates cURL’s additional information, “<” indicates the response headers and “>” indicates the request headers.

The Data (Body)

Let’s come to the final component of a request, also known as the message or the body. It entails information that is to be sent to any server. To use cURL for sending data, you can use the –data or the –d options like the following format.

For multiple fields, you can write the following .i.e. add two –d options.

It is also possible to break requests into several lines for better readability. When you learn how to spin (start) servers, you can easily create your API and test it with any data. If you are not interested in spinning up a server, you can use Requestbin.com and hit the “create endpoint”. In response, you can get a request which can be used for testing requests. In order to test requests, you have to generate your own request bin. Keep in mind that these request bins have a lifespan of 48 hours. Now you can transfer data to your request bin by using the following.

curl -X POST https://requestb.in/1ix963n1 \

-d name=adam \

-d age=28

cURL’s data transfer is similar to a web page’s form fields. For JSON data, you can alter your “Content-Type” and change it to “application/json”, like this.

curl -X POST https://requestb.in/1ix963n1 \

-H “Content-Type: application/json” \

-d ‘{

“adam”:”value”

“age”:”28”

}’

And with this, your request’s anatomy is finished.

Authentication

While using POST requests with your Github API, a message displays “Requires authentication”. What does this mean exactly?

Developers ensure that there are certain authorization measures so specific actions are only performed by the right parties; this negates the possibilities of impersonation by any malicious third party. PUT, PATCH, DELETE, and POST requests change the database, forcing the developers to design some sort of authentication mechanism. What about the GET request? It also needs authentication but only in some cases.

In the world of web, authentication is performed in two ways. Firstly, there is the generic user/password authentication—known as the basic authentication. Secondly, authentication is done by a secret token. The second method consists of something known as oAuth—it uses Google, Facebook, and other social media platforms for user authentication. For using the user/password authentication, you have to use the “-u” option like the following.

You can test this authentication yourself. Afterward, the previous “requires authentication” response is changed to “Problems parsing JSON”. The reason behind this is that so far, you have not sent any data. Since it is a POST request, data transfer is a must.

HTTP Error Messages and Status Codes

The above-mentioned messages like “Problems parsing JSON” or “Requires authentication” fall into the category of HTTP error messages. These emerge whenever a request has an issue. With HTTP status codes, you can learn your response status instantly. The range of these codes starts from 100+ and end to 500+.

  • The success of your request is signified by 200+.
  • The redirection of the request to any URL is signified by the 300+.
  • If the client causes an error, then the code is 400+.
  • If the server causes an error, then the code is 500+.

In order to debug a response’s status, you can use the head or verbose options. For instance, if you add “-I” in a POST request and do not mention the username/password details, then it can cause a 401 status code. When your request is flawed—either due to incorrect or missing data, a 400 status code appears.

Versions of APIs

Time and again, developers upgrade their APIs, it is a life-long process. When too many modifications are required, the developers should consider creating a new version. When this occurs, it is possible that your application gets an error; due to the fact that you wrote code with respect to the previous version API while the brand-new API is pointed out by your requests.

In order to perform a request for a certain version of the API, there are two methods. Depending on your API’s structure, you can choose any of them.

  • Use endpoint.
  • Use the request header.

For instance, Twitter follows the first strategy. For instance, a website can follow it in this way:

https://api.abc.com/1.1/account/settings.json

On the other hand, Github takes advantage of the second method. For instance, consider the following where the API version is 4 as mentioned in the request header.

curl https://api.abc.com -H Accept:application/abc.v4+json

 

 

What Is Bean in Spring? What is the Scope and Lifecycle of Bean?


What Is Bean in Spring? What is the Scope and Lifecycle of Bean?

To learn about Java Spring, it is necessary to understand what a bean is. Similarly, learn about the scope and lifecycle of a bean.

What Is a Bean in Spring

In Spring, the Spring IoC container manages objects which powers the entire application. These objects are known as beans. All objects which Spring IoC instantiates assembles, and supervises fall into the category of a bean.

You can create a bean by providing the configuration metadata to your container. A bean has the following properties.

  • class: It is a mandatory attribute and defines the bean class which will be required for the generation of the bean.
  • name: It defines a unique name for the bean identifier. If you are working with configuration metadata which is powered by XML, you can utilize the name or id to define the bean identifier.
  • Scope: It defines the scope for all those objects which are generated through a single bean definition.
  • constructor-arg: It is used for the dependency injection.
  • properties: It is used for the dependency injection.
  • autowiring mode: It is used for the dependency injection.
  • lazy-initialization mode: By using a lazy-initiated bean, you can generate an instance of bean whenever the first request comes as opposed to the bean creation at the startup.
  • initialization method: It is used as a callback when the IoC container defines the mandatory properties of the bean.
  • destruction method: It is a callback which is used when the container which holds the bean is eliminated.

Scope Of Bean

While specifying a bean, you can also define its scope. For instance, you can set a “prototype” attribute for a bean’s scope; this means that the Spring has to generate a new instance of bean whenever it is needed. On a similar note, you can use the “singleton” attribute if you need to return bean instances in accordance with your requirements. In total, there are five bean scopes.

  • singleton: It defines a single bean instance for each Spring IoC container.
  • prototype: It defines a single bean instance with multiple instances for an object.
  • request: It defines an HTTP request for a bean.
  • session: It defines an HTTP session for a bean.
  • global-session: It defines a global HTTP session for a bean.

Singleton

When a bean’s scope is defined as a singleton, then precisely one object instance is generated by the Spring IoC container. The instance is then saved in a cache which is used to store all the beans associated with the singleton scope. Afterward, any reference or request which comes for that bean sends back the cached object. By default, the scope of a bean is always set to a singleton. In case, you only require a single bean instance, configure your scope property and change it to single. For instance, consider the following format.

<!—Using the singleton scope to define a bean –>

<bean id = “…” class = “…” scope = “singleton”>

<!–write the configuration and collaborators of the bean here –>

</bean>

Prototype

When the scope is specified as a prototype, a new bean instance is generated by the Spring IoC container whenever a request is generated for that bean. Therefore, utilize the prototype scope when your beans are stateful while using the singleton when they are stateless. To specify a prototype scope, you can use the following format.

<!—Use the prototype scope to define a bean –>

<bean id = “…” class = “…” scope = “prototype”>

<!– write the configuration and collaborators of the bean here –>

</bean>

Bean Life Cycle

After the instantiation of a bean, it can execute initialization so it changes into a usable state. After the need for a bean is completed and subsequently, it is deleted, there has to be some sort of cleanup. In the post, we would discuss 2 of the most crucial lifecycle callback methods for beans.

To begin with, you can write <bean> and use with destroy-method or init-method parameters. The latter is used to define a method which is called first after the bean’s instantiation. The destroy method is used to define a method which is revoked after the deletion of a bean from the container.

Initialization Callbacks

In the org.springframework.beans.factory.InitializingBean interface, you can use the following method.

void afterPropertiesSet() throws Exception;

Therefore, you can easily use this interface and initialize by using the following format.

public class BeanExampleOne implements InitializingBean {

public void afterPropertiesSet() {

// write the code for initialization

}

}

If you are using configuration data which uses XML, then you utilize the attribute, “init-method”, to define a method name having a signature without any argument.

<bean id = “beanExampleOne” class = “examples.BeanExampleOne” init-method = “init”/>

Consider the following definition for your class.

public class BeanExampleOne {

public void init() {

// write the code for initialization

}

}

Destruction Callbacks

In the org.springframework.beans.factory.DisposableBean interface, you can use the following method.

void destroy() throws Exception;

Therefore, you can easily use this interface and initialize by using the following format.

public class BeanExampleTwo implements DisposableBean {

public void destroy() {

// write the code for destruction

}

}

If you are using configuration data which uses XML, then you utilize the attribute, “destroy-method”, to define a method name having a signature without any argument.

<bean id = “beanExampleTwo” class = “examples.BeanExampleTwo” destroy-method = “destroy”/>

Consider the following definition for your class.

public class BeanExampleTwo {

public void destroy() {

// write code for destruction

}

}

When you are working in a non-website application setup with a Spring IoT container, for instance, in a desktop environment, then you can configure a shutdown for the Java Virtual Machine. This helps to create an efficient shutdown and revokes only the appropriate destroy methods.

It must be noted that the use of Disposable Bean or Initializing Bean callbacks is not recommended as XML configuration allows a greater degree of flexibility for the XML configuration.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Are Microservices the Right Fit For You?


The term Microservices was originally coined in 2011. Since then it has been on the radars of modern development organizations.  In the following years, software architecture has gained traction in various IT circles. According to a survey, the enterprises which used microservices were around 36 percent while 26 percent were thinking to include it in the future.

So, why exactly should you use microservices your company? There has to be something unique and more rewarding in it that can compel you to leave your traditional architecture in favor of it. Consider the following reasons to decide for yourself.

Enhance Resilience

Microservices can help to decouple and decentralize your complete application into multiple services. These services are distinct because they operate independently and are separate from each other. As opposed to the conventional monolithic architecture in which code failure can disrupt one function or service, there are little to no possibilities a single service failure to affect another. Moreover, even if you have to do maintain code for multiple systems, it will not be noted by your users.

More Scalability

In a monolithic architecture, when developers have to scale a single function, they have to tweak and adjust other functions as well. Perhaps, one of the biggest advantages of microservices is the scalability which it brings to the table. Since all the services in microservices architecture are separate, therefore it is possible to scale one service or function without having to worry about scaling up the complete application. You can deploy critical business services on different servers to improve the performance and availability of your application whereas your other services remain unaffected.

Right Tool for the Right Task

Microservices ensure that a single vendor does not make you pigeonholed. It can help you to infuse greater flexibility for your projects so rather than trying to make things work with a single tool, you can instead look up for the right tool which can fit your requirements. Each of your services can use any framework, programming language, technology stack, or ancillary services. Despite this heterogeneousness, they can still communicate and connect easily.

Promotion of Services

In microservices, there is no need to rewrite and adjust the complete codebase if you have to change or incorporate a new feature in your application. This is because microservices are ‘loosely coupled’. Therefore, you only have to modify a single service if it is required. The strategy to code your project in smaller increments can help you to test and deploy them independently. In this way, you can promote your services and application quickly, as soon as you complete one service after another.

Maintenance and Debugging

Microservices can help you to test and debug applications easily. The use of smaller modules via continuously testing and delivery means that you can create applications from bugs and errors, thereby improving the reliability and quality of your projects.

Better ROI

With microservices, your resource optimization is instantly improved. They allow different teams to operate by using independent services. As a result, the time needed to deploy is reduced. Moreover, the time for development is also significantly decreased while you can achieve greater reusability as well for your project. The decoupling of services also means that you do not have to spend much on high-priced machines. You can use the standard x86 machines as well. The efficiency which you get from microservices can minimize the costs of infrastructure along with the downtime.

Continuous Delivery

While working with a monolithic architecture, dedicated teams are needed to code discrete modules like front-end, back-end, database, and other parts of the application. On the other hand, microservices allow project managers to add cross-functional teams in the mix who can manage the application lifecycle through a delivery model which is entirely continuous in nature. When testing, operations, and development teams use a single service at the same time, debugging and testing is quickened and made easier. This strategy can help you to develop, test, and deploy your code ‘continuously’. Moreover, you do not have to write new code, instead, you can write code with the help of the existing libraries.

Considerations before Deciding to Use Microservices

If you have decided to use a microservices-based architecture, then review the following considerations.

The State of Your Business

To begin with, you have to think if your business is big enough that it warrants your IT team to work on complex projects independently. If you are not, then it is better to avoid microservices.

Assess the Deployment of Components

Analyze the components and functions of your software. If there are two or more components which you deploy in your project which are completely separate from each other in terms of business processes and capabilities, then it is a wise option to use microservices.

Decide if Your Team Is Skilled for the Project

The use of microservices allows project managers to use smaller teams for development that are well-skilled in their respective expertise. As a result, it helps to quickly generate new functionalities and release it.

Before you adopt the microservices architecture, you have to make sure that your team members are well positioned to operate with continuous integration and deployment. Similarly, you have to see if they can work in a DevOps culture and are experienced enough to work with microservices. In case, they are not good enough yet, you can focus on creating a group who is able to fulfill your requirements to work with microservices architecture. Alternatively, you can also hire experienced individuals to make up a new team.

Define Realistic Roadmap

Exponential scaling is the key to success. Despite the importance of businesses to be agile, it is not necessary for all businesses to scale. If you feel that complexity cannot help you much, then it is better to avoid a microservices architecture. You have to decide on some realistic goals about how your business is going to operate in the future to decide if the adoption of microservices architecture can reap your benefits.

What Is Artificial Neural Network and How Does It Work?


The whole idea behind artificial intelligence is to make a machine act like a human being. While many sub-divisions of AI originated with their own set of algorithms to mimic humans, artificial neural networks (ANNs) are AI at its purest sense; they mimic the working of the human brain, the core and complex foundation which influences and affects the thinking and reasoning of human beings.

What Is an Artificial Neural Network?

ANN is a machine learning algorithm. It is founded on the scientific knowledge about organic neural networks (working of the human brain). ANN works quite similar to how human beings analyze and review information. It is composed of several processing units which are linked together and perform parallel processing for the computation of data.

As machine learning is primarily focused on “learning,” ANNs continuously learn and adapt. The processing units in ANNs are commonly referred to as neurons or nodes. Bear in mind that neuron in biology refers to the most basic units in the human nervous system. Each node is linked via arcs which have their own weight. The artificial neural network is made up of three layers.

Input

The input layer is responsible for accepting explanatory attribute values which are collected from observations. Generally, input nodes are explanatory variables. Patterns are submitted to the network by the input layer. Subsequently, those patterns are then analyzed by the hidden layers. The input layer nodes are not involved in modifying any data. They accept individual values as inputs and then perform duplication of the value so it can be passed on to multiple outputs.

Hidden

The hidden layers modify and transform values collected from the input layer. By utilizing a technique of weight links or connections, the hidden layer initiates computation on the data. The number of hidden layers depends upon the artificial neural network; there may be one or more than one hidden layers. Nodes in this layer multiply the collected values by the weights. Weights are a predetermined set of numbers which convert the input values with the help of summation to generate an output in the form of a number.

Output

Afterward, the hidden layers are connected to an output layer which may also receive a connection directly from an input layer. It generates a result, which is associated with the response variable’s prediction. Generally, when the machine learning process is geared towards classification and its disciplines, there is a single output node. The collected data in the layer is integrated and modified for the generation of new values.

The structure of a neural network is also called topology or architecture. All the above layers of the ANN form the structure. The planned design of the structure bears utmost importance to the final findings of the ANN. At its most basic, a structure is divided into two layers which are comprised of one unit each.

The output unit also possesses two functions: combination and transfer. When there are multiple output units, then logistic or linear regression can be at work and the nature of the function ultimately decides it. ANN’s weights are actually coefficients (regression).

So what do the hidden layers do? Well, the hidden layers are incorporated into ANNs to enhance the prediction strength. However, it is recommended to add them smartly because excessive use of these layers may mean that the neural network stores all the learning data and may not able to generalize, causing an over-fitting problem. Over-fitting arises when the neural network is not able to discover patterns and is heavily reliant on its learning set to function.

ai1

 

Applications

Due to their accurate predictions, ANNs have broad adoption across multiple industries.

Marketing

Modern marketing focuses on segmenting customers within well-defined and distinct groups. Each of these groups exhibits certain characters that are reflecting of its customer habits. In order to generate such segmentation, neural networks present themselves as an efficient solution for predicting strength to identify patterns in a customer’s purchasing habits.

For instance, it can analyze how much time customers take between each purchase, how much do they spend, and what do they mostly purchase. ANN’s input layer takes all the attributes like location, demographics, and other personal or financial information about a customer to generate meaningful output.

Supervised neural networks are usually trained to comprehend the link between clusters of data. On the other hand, unsupervised neural networks are used for segmentation of customers.

Forecasting

Forecasting is a part and parcel of a varied list of domains including governments, sales, finance, and other industries, especially their use in the monetary and economic aspects. Often, forecasting faces a tumbling roadblock because of its complexity. For instance, the prediction of stocks is considered difficult because the stock market addresses multiple seen and unseen factors where traditional forecasting becomes ineffective.

This conventional forecasting is founded merely on statistics. ANNs use the same statistical methods and techniques and enhances forecasting where its layers are sophisticated enough to tackle the complexity of the stock market. Moreover, in contrast to the conventional methods, ANN is non-restrictive for input values and residual distributions.

Image Processing

Since the layers in artificial neural networks are able to accept several input values and compute them flexibly to determine complex and non-linear hidden relationships, they are well-equipped to serve in image processing and character recognition. In criminal proceedings like bank frauds, fraud detection requires accurate results for character recognition because humans cannot go over thousands of samples to pinpoint a match. Here, ANNs are useful as they are able to recognize the smallest of irregularities. Similarly, ANN is used in facial recognition with positive results where they are able to improve governance and security.

Final Thoughts

The emergence of artificial neural networks has opened a whole new world of possibilities for machine learning. With their adoption in real-world industries, the algorithm has become one of the most trending and research topics in a short period of time.

What Is AIOps?


Recently I came across one of the very interesting topics- AIOps.

AIOps refers to Artificial Intelligence for IT operations. It involves the use of machine learning, big data analytics, and AI tools to automate the IT infrastructure of an organization.

In larger enterprises, the applications, systems, and services generate massive data volumes. With AIOps, organizations can utilize this data for monitoring their assets and examining their IT dependence more closely.

Capabilities

Ideally, an AIOps solution provides the following functionalities.

1- Automation for Routine Procedures

AIOps facilitates organizations to integrate automation in daily routine procedures. This automation can be performed for requests from users or to manage non-critical notifications from the system. For instance, if AIOps is used, then a help desk system can respond appropriately to a request from a user—all by itself. Similarly, AIOps tools can assess an alert from a system and evaluate if it requires any action without the need of a supervising authority.

2- Detection

AIOps can detect critical issues quicker and better than any manual strategy. When a familiar malware is detected on a non-critical system, the IT experts may try to eliminate it. In the meantime, they might miss an unusual activity or process on a critical system from a newly-arrived and sophisticated threat. As a consequence, the organization suffers a huge setback.

On the other hand, AIOps can make a difference by the use of vulnerability prioritization .i.e. it immediately notifies the authority about a possible cyber invasion for the critical system while for the non-critical system, it can respond by running an anti-malware tool.

3- Streamlining Interactions

Historically, before AIOps was in the scene, teams had to share and work on information through either meetings or exchanging data manually. AIOps can streamline the communication and coordination processes between teams and data center groups. It shows “relevant” data to all the IT groups. For this, the AIOps platform must be designed in a way so that it can monitor and analyze which type of data to present to which of the team.

Technologies

AIOps combines several techniques for aggregation, algorithms, analytics, data output, visualization, machine learning, and automation and orchestration. All of these techniques are mature and integrated.  So how do they work?

Log files, helpdesk ticketing systems, and monitoring provide data for AIOps. Then big data tools are used to properly manage and aggregate any data coming from the system as an output and convert it into a more useful format. To do this, analytics methods and procedures are used which attempt to extract raw data and transform it into a meaningful fresh piece of data. Therefore, analytics eliminates “noise” and irrelevant data. Additionally, it also searches for recurring patterns that can detect and mark common issues.

However, analytics cannot run without the use of proper algorithms. Algorithms support an AIOps solution to respond with the most appropriate course of action. Algorithms are configured to ensure that the IT staff can help the platform learn about the decisions pertaining to the application performance.

Algorithms are the center of machine learning. The AIOps platform sets out a standard for normal activities and behavior where it can continue to update by adding new algorithms with the addition of new data in the infrastructure of the organization.

Automation ensures that any AIOps tool is quick in performing the required action or set of actions. Automated tools are “forced” to act based on their communication with machine learning and analytics tools. For instance, a machine learning tool may establish that an application in a system requires additional storage to function. This piece of information is passed out to an automation tool which resolves to perform an action like adding more storage.

Lastly, to help in decision making, visualization is used in AIOps. It generates dashboards which are extremely easy to use and read. These dashboards contain graphical representations of all kinds, reports, and other visual elements to simplify different types of output. As a result, the management is able to remain in the loop and take any rational decision.

How Has AIOps Proved to Be a Breakthrough?

Before the emergence of AIOps, organizations faced difficulties because their IT personnel spent much of their time on routine and basic tasks. AIOps proved to be a breakthrough by helping organizations focus on more critical issues. As a result, such platforms have saved a great deal of time. IT personnel now attempt to train and educate AIOps platforms to become familiar with the organization’s IT infrastructure.

Afterward, it continues to update and evolve by making use of machine learning and algorithms as well as going through the “learned” history which it accumulated with the passage of time. Therefore, they provide for an excellent monitoring tool that has the “rationality” to perform many useful tasks.

Moreover, AIOps platforms examine and inspect causal relationships from various services, resources, data sources, and systems. Machine learning functionalities identify and run robust root cause analysis. As a result, troubleshooting of frequent issues is enhanced.

Furthermore, AIOps assists organizations to increase collaboration among all the departments. With the reports from visualization, team leaders are able to comprehend requirements and perform their duties with a renewed sense of direction.

The Other Side of the Coin

AIOps is extremely promising, but some analysts consider it to be unrefined. The debate that the effectiveness of an AIOps platform is as powerful as its “training” while the time needed to create, implement, and administer such a platform may be too time-consuming for many organizations.

Likewise, they argue that due to its ability to perform a wide variety of tasks, it requires trust from organizations. Since AIOps tool works autonomously, they have to be trained in such a way that they can easily adapt according to the environment of their organization and be able to accumulate and collect data, come up to the most logical conclusion, and allocate actions accordingly.

What Are Text Indexes in MongoDB?


MongoDB offers text indexes for search queries which include strings in their contents. At most, a collection in MongoDB can have no more than one text index. So the question is: how to build a text index?

Like other indexes, a text index can also be created using the db.collection.createIndex() method. Such an index can be built on a string array too. To define a text index for a field, you have to type “text” like the following instance.

db.employees.createIndex( {name:”text”} )

In this example, a text index has been built on the “name” field. Similarly, other fields can also be defined by using the same index.

Weights

While working with text indexes, you must become familiar with the concept of weight. Weight refers to an indexed field and marks its importance in comparison to other fields (which are also indexed) by processing the score for text search.

In all the indexed fields of a document, the match number is multiplied with the weight and the output is summed. MongoDB then takes the sum value and processes it to generate the document’s score.

By default, each indexed field carries a weight of 1. Weights can be modified in the db.collection.createIndex() method.

Wildcard Specifier

In MongoDB, there is also a wildcard specifier ($**). When this specifier is used in conjunction with a text index then it is referred to as a wildcard text index. What this does is that it applies indexing on every field which stores data in the form of strings for all the collection’s document. A wildcard specifier can be defined by using the following method.

db.collection.createIndex( {“S**”: “text”} )

Basically, wildcard text indexes can be seen as text indexes which work on more than a single field. To govern the query results’ ranking, weights can be specified for certain fields while building text indexes.

Case Insensitivity

The 3rd (latest) version of the text index offers support for the simple s and common c. The special T case folding found in Turkish is also supported.

 

The case insensitivity of the text index is further improved with support for diacritic insensitivity (a mark which represents different pronunciation) like É and é. This means that the text index does not differentiate between e, E, é, and É.

Tokenization

For purposes related to tokenization, the text index version 3 supports the following delimiters.

  • Hyphen
  • Dash
  • Quotation_Mark
  • White_Space
  • Terminal_Punctuation
  • Dash

For instance, if a text index finds the following text string, then it would treat spaces and “« »” as delimiters.

«Messi est l’un des plus grands footballeurs de tous les temps»

Sparse

By default, text indexes are “sparse” and therefore they do not need to be explicitly defined with the sparse option. When a document does not have a field indexed with a text index, then MongoDB does not make the document’s entry with the text index. When insertion occurs, then MongoDB does insert the document, however no addition occurs with the text index.

Limitations

Earlier, we talked about how there can be no more one text index for a collection. There are some more limitations for text indexes.

  • When a query entails the $text operator, then it is not possible to use the hint() method.
  • It is not possible for sort operations to utilize the text index’s arrangement or ordering, even if it is a compound text index.
  • It is possible to add a text index key to generate a compound index. Though, they have some limitations. For example, it is not possible to use special index types like geospatial index with a compound text index.

Moreover, while building a compound text index, all text index keys have to be defined adjacently. This specifying of index must come in the index specification document.

Lastly, if there are keys which precede the key of text index, then for executing a search with $text operation, it is necessary for the query predicate to use quality match conditions for the preceding keys.

  • For dropping a text index, it is mandatory to mention the index name in the method of db.collection.dropIndex(). In case you do not know the index name, the db.collection.getIndexes() method can be used.
  • There is no support for collation while working with text indexes. However, they do provide support for simple binary comparison.

Performance and Storage Constraints

While using text indexes, it is necessary to realize their impact on the performance and storage of your application

  • The creation of a text index is not too dissimilar to creating a huge multikey index. A text index takes considerably more time in comparison to a basic ordered index.
  • Considering the nature of applications, it is possible for text indexes to be “enormous”. They carry a single entry for an index in correspondence with every unique or special post-stemmed word for all the indexed fields whenever documents are inserted.
  • Text indexes do not save information or phrases related to a word’s proximity within the document. Therefore, queries with phrases run better in comparison if the complete collection is fitted into the RAM.
  • Text indexes have an effect on insertion operations in MongoDB. This is because MongoDB has to include entries for index with every post-stemmed string in indexed fields related to every newly-created source document.
  • While creating a big text index for a collection which exists for some time, make sure that you have a strong limit for open file descriptions.

$text Operator

The $text operator is used to conduct a textual search for a field’s contents which is text-indexed. An expression with the text operator has the following components.

  • $search – A single or multiple terms which is used by MongoDB for parsing and querying text indexes.
  • $language – An optional component which represents the language that is to be used for tokenizer, stemmer, and stop-words.
  • $caseSensitive – An optional component which is used to turn on or off the case sensitive search.
  • $diacriticSensitive – An optional component which is used to turn on or off the diacritic sensitive search.

An Introduction to Multikey Indexes with Examples


After understanding single-field and compound indexes, now is the time to learn about multikey indexes. When a field has an array value, an index key is generated for each of its array elements. These indexes are referred as multikey indexes. They can be used with arrays that have scalar values as well as those with nested documents.

To generate a multikey index, you have to use the standard db.collection.createIndex() method. Such indexes are automatically generated by MongoDB whenever it senses an indexed field specified as an array. Hence, there is no need for explicit definition of a multikey index.

Examples

While working with a standard array, let’s suppose we have a collection “student”.

{ _id: 21, name: “Adam”, marks: [ 80, 50, 90 ] }

To build an index on the “marks” field, write the following query.

db.student.createIndex( { marks: 1 } )

As the marks field is an array, thus this index is an example of a multikey index. All of the keys (80, 50, and 90) in its elements point to the same document.

To create a multikey index for array fields having embedded documents, let’s suppose we have a collection “products”.

{

_id: 5,

name: “tshirt”,

details: [

{ size: “large”, type: “polo”, stock:50 },

{ size: “small”, type: “crew neck”, stock:40 },

{ size: “medium”, type: “v neck”, stock: 60 }

]

}

{

_id: 6,

name: “pants”,

details: [

{ size: “large”, type: “cargo”, stock: 35 },

{ size: “small”, type: “jeans”, stock: 65 },

{ size: “medium”, type: “harem”, stock: 10 },

{ size: “large”, type: “cotton”, stock: 10 }

]

}

{

_id: 7,

name: “jacket”,

details: [

{ size: “large”, type: “bomber”, stock: 35 },

{ size: “medium”, type: “leather”, stock: 25 },

{ size: “medium”, type: “parka”, stock: 45 }

]

}

We can build a multikey index with the details.size and details.stock fields.

db.products.createIndex( { “details.size”: 1, “details.stock”: 1 } )

This index is now good to go against queries which have only a single field of “details.size” as well as queries having the both of the indexed fields. As such, these types of queries would benefit from the index.

db.products.find( { “details.size”: “medium” } )

db.products.find( { “details.size”: “small”, “details.stock”: { $gt: 10 } } )

Bounds in Multikey Index

Bounds represent the limits of an index .i.e. how much it needs to scan for searching a query’s results. If there are more than a single predicate with an index, then MongoDB integrates them through compounding or intersection. So what do we mean by intersection and compounding?

Intersection

Bounds intersection point towards the presence of “AND” (logical conjunction) for bounds. For example, if there are two bounds [ 4, Infinity]  and [ – Infinity, 8 ], then intersection bounds process [[4, 8 ]] When $elemMatch operation is used, then MongoDB applies intersection on multikey index bounds.

 

What Is $elemMatch?

Before moving forward, let’s understand the use of $elemMatch first.

$elemMatch is used for matching documents in array field where at the bare minimum, atleast one of the element is matched with the query. Bear in mind, that the operator does not work with $text and $where operators. For a basic example, consider a “student” collection.

{ _id: 8, marks: [ 72, 75, 78 ] }

{ _id: 9, marks: [ 65, 78, 79 ] }

The following query only processes a match with documents in which the “marks” array has at least a single element which is less than 75 and greater or equal to 70.

db.student.find(

{ marks: { $elemMatch: { $gte: 70, $lt: 75 } } }

In response, the result set is comprised of the following output.

{ “_id” : 8, “marks” : [72, 75, 78 ] }

Despite the fact that both 75 and 78 do not conform to the conditions but because 72 had a matched, hence the $elemMatch selected it.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

For instance, we have a collection student which has a field “name” and an array field “marks”.

{ _id: 4, name: “ABC”, marks: [ 1, 10 ] }

{ _id: 5, name: “XYZ”, marks: [ 5, 4 ] }

To build a multikey index with the “marks” array.

db.student.createIndex( { marks: 1 } )

Now, the following query makes use of $elemMatch which means that the array must have at least one element which fulfils the condition of both predicates.

db.student.find( { marks : { $elemMatch: { $gte: 4, $lte: 8 } } } )

Computing the predicates one by one:

  • For the first predicate, the bounds are equal to or greater than 4 [ [ 4, Infinity ] ].
  • For the second predicate, the bounds are equal to or less than 8 [ [ – Infinity, 8 ] ].

Since $elemMatch is used here, therefore MongoDB can apply intersection on the bounds and integrate it like the following.

marks: [ [ 4, 8 ] ]

On the other hand, if the $elemMatch is not used, then MongoDB applies intersection on the multikey index bounds. For instance, check the following query.

db.student.find( { marks : { $gte: 4, $lte: 8 } } )

The query processes the marks array for atleast a single element which is equal to or greater than 4 “AND” atleast a single element which is equal to or less than 8. However, it is not necessary for a single element to conform to the requirements of both predicates, hence MongoDB does not apply intersection on the bounds and uses either [[4, Infinity]] or [[-Infinity, 8]].

Compounding

Compounding bounds means the use of bounds with compound index. For example, if there a compound index { x: 1, y:1 } which has a bound on the x field  [[4, Infinity] ] and a bound on the field y [[-Infinity, 8]]. By applying compounding,

{ x: [ [ 4, Infinity ] ], y: [ [ -Infinity, 8 ] ] }

Sometimes, MongoDB is unable to apply compounding on the given bounds. For such scenarios, it uses the bound on the leading field which in our example is x: [ [4, Infinity] ].

Suppose in our example indexing is applied on multiple fields, where one of the fields is an array. For instance, we have the collection student which stores the “name” and “marks” field.

{ _id: 10, name: “Adam”, marks: [ 1, 10 ] }

{ _id: 11, name: “William”, marks: [ 5, 4 ] }

Build a compound index with the “name” and the “marks” field.

db.student.createIndex( { name: 1, marks: 1 } )

In the following query, there is a condition which applies on both of the indexed keys.

db.student.find( { name: “William”, marks: { $gte: 4 } } )

Computing these predicates step-by-step.

  • For the “name” field, the bounds for the “William” predicate are the following [ [ “William”, “William” ] ].
  • For the “marks” field, the bounds for { $gte: 4 } predicate are [ [ 4, Infinity ] ].

MongoDB can apply compounding on both of these bounds.

{ name: [ [ “William”, “William” ] ], marks: [ [ 4, Infinity ] ] }