Amazon S3 is used by many companies for storage purposes. Due to its use as object storage, it offers flexibility with a slew of data types including small objects to massive datasets. Thus, S3 has carved out its niche as a great service which can store a broad scope of data types via a resilient and available environment. As your objects of S3 have to be accessed and read by other AWS services, applications, and end users, do you believe that they are optimized to offer the best possible performance? Here is how you can optimize your S3. Follow these tips to improve your performance with Amazon S3.
Perform TCP Window Scaling
TCP window scaling facilitates developers to improve the network throughput performance via the modification of the header in the TCP packet that uses a window scale. This helps in sending data with a single segment—more than the traditional 64 KB. It is important to note that such practice is exclusive to S3; it functions along with the protocol level. As a result, by using this protocol you can execute window scaling for your client while establishing a connection with a server.
When a connection is established between a destination and source by the TCP, then the next thing is a 3-way handshake that starts up from the source. This means that with the S3 view, it is possible that the client might be required to upload an object to the S3. However, prior to this, you must create a connection with the S3 servers.
A TCP packet will be sent by the client along with a defined window scale of TCP in the header—such a request is also referred to as SYN request—the first part of the 3-way handshake. When the S3 gets this request, it uses an SYN/ACK message to send a response to the client while using the same window scale factor—this forms the second part of the 3-way handshake—maintaining the relevant window scale factor. The third and final part consists of an ACK message which is sent to the S3 server—it serves as the response’s acknowledgment. A connection is generated after this 3-way handshake ends where the S3 and client can finally exchange data.
In order to send more data, you can widen the window size via a scale factor. This helps in sending voluminous amount s of data via a single segment whereas your speed is also quickened.
Use Selective Acknowledgment
At times while using TCP, it is not uncommon for packets to get lost. To figure out which of the packets went missing is hard within a TCP window. Consequently, at times it is possible to resend all of these packets. However, then the receiver may have received some of the packets so this is an ineffective strategy.
Instead, you can use TCP SACK (selective acknowledgment) for improving performance where the sender receives notifications about which were the failed packets for a window. As a result, the sender can then easily only resend the failed packets.
However, it is necessary that the source client or the sender initiates the SACK when a connection is being established amidst the handshake’s SYN phase. Such an option is also called as SACK-permitted. For using and implementing SACK, you can visit this link.
Setting Up S3 Request Rates
Alongside TCP SACK and TCP Scaling, S3 is quite nicely optimized to address a high request throughput. A year back, in 2018, AWS introduced a new change for these request rates. Before the announcement, it was recommended that the prefixes can be randomized within the bucket to help with performance optimization—there is no more need for it. Now exponential growth of request rate performance can be achieved when more than one prefixes within the bucket are used.
Now, developers are getting a 3,500 PUT/POST/DELETE request for each second while they are also achieving 5,500 GET requests. A single prefix is a reason behind such limitations. However, keep in mind that there is no limit for prefixes which are to be used in an S3 bucket. What this means is that if you use 2 prefixes then you can get 110,000 GET requests and 70,000 PUT/POST/DELETE for each passing second for the same bucket.
There is no hierarchical-based structure in the folders of S3; it follows a flat structure for storage. This means that all you need is a bucket while all the objects are saved in a flat space of the bucket. You can generate folders and store objects in it—without depending on a hierarchical system. The prefixes of the object are responsible to make them unique. For instance, in case you have the following objects in a bucket:
Here, the ‘Design’ folder serves as a prefix for identifying the object—such a pathname is also referred to as the object key. The ‘Objective’ folder is also an object’s prefix while the ‘Will.jpg’ is without any prefix.
Amazon Cloud Front
One more strategy for optimization is to integrate Amazon CloudFront with Amazon S3. This is a wise strategy when the request of the S3 data is a GET request. CloudFront is a content delivery network which increases the pace of the distribution of dynamic and static content across a global network comprising of edge locations.
Typically, after a user sends a request from S3 in the form of GET request then the S3 service is used to route it while the relevant servers return the content. In case you use CloudFront with S3 then it can also perform caching for those objects which are requested commonly. Hence, the user’s GET request is directed to the nearest edge location that offers low latency for returning the cached object and providing the best performance. It also decreases the AWS costs when the number of GET requests in the buckets is shortened.