Deep dive into AWS for developers | Part4 — S3

aditya goel
19 min readMay 8, 2021

--

In case, you are landing here directly, it would be recommended to visit this page. In this section, we would deep dive to learn about AWS-S3.

Amazon S3 is an “infinitely-scaling” storage and therefore we don’t need to plan its storage size in advance. It is one of the building blocks of AWS cloud. Many websites uses Amazon S3 as its backbone. Many other AWS services also uses Amazon S3 as an integration-component as well. Amazon S3 allows people to store objects (files) in S3 buckets (directories). Buckets must have globally(Throughout all the accounts across the globe) unique names, but buckets are defined at region level. Please note that, S3 is not a global-service, its only that, S3 has global console. Following is the naming convention for creating S3 buckets :-

  • No uppercase. Must start with lowercase letter or number.
  • No underscore.
  • Length can be anything between 3 to 63 characters.
  • We can’t specify an IP in the name of S3 buckets.

While objects are being stored inside the Amazon S3 bucket, each object have a key associated with it. In below demonstration, text highlighted in ‘blue’ color is nothing but the KEY for the object being stored at S3.

Object-values are the content of the body. Maximum allowed object-size is 5 TBs (i.e. 5000 GBs). Each object have versioning enabled (i.e. ‘Version ID’).

Demonstration of AWS S3 :- Let’s first create the AWS S3 by specifying the bucket name and region & AZ :-

Next, there is an option to block all the public access for this specific bucket :-

Next, there are some other settings like Bucket versioning, encryption, etc. which we can define for bucket creation :-

Finally, upon hitting the create button, bucket stands to be created :-

Let’s now upload some file to this S3 (Here, we are uploading this picture to the S3 bucket) :-

Post the file has been uploaded, here the status looks like :-

We can also open, (the uploaded file) as well, by clicking upon the “Object-Actions” → “Open”. This button uses pre-signed URL. Usually, these pre-signed URLs are too long. In our case, the public URL seems something like :

https://adityas-first-s3-bucket.s3.ap-south-1.amazonaws.com/S3-bucket-picture.png?response-content-disposition=inline&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEOX%2F%2F%2F%2F%2F%2F%2F%2F%2F%2FwEaCmFwLXNvdXRoLTEiRjBEAiAhOl1c….c75

Here is how the file can be viewed using pre-signed URLs. Please note that, In this URL, we are not only using public URL, but also passing the authentication credentials, in the URL itself.

Now, since we disabled the public access, accessing this file, through public URL (i.e. Object URL) shall throw us Error ‘Access Denied’. This also shows that, our S3 bucket is not public. In our case, the public URL is :

https://adityas-first-s3-bucket.s3.ap-south-1.amazonaws.com/S3-bucket-picture.png

In exactly similar way, we can also create the folders inside our S3 bucket and upload files under those folders.

Introduction to AWS S3 Versioning :- We can version our files inside S3. Versioning first needs to be enabled at bucket level first. If we upload different version of file with same-key, then it will override the file. Actually it will not override it rather new version shall be created for that key. As a recommended practice, its advisable to enable property of version our bucket. Versioning provides protection from unintended deletes and files can be restored/rolled-back from previous versions. Please note that, any file that is not versioned prior to enabling versioning will have version as “null”.

Demonstration of the Versioning :- First, lets enable the property of versioning on the S3 bucket level :-

Next, after the property is being enabled, here is how the confirmation looks like :-

Next, after the property is being enabled, during the objects being displayed in S3, on the console, we can list down their corresponding versions as well. Please note that, in below case, versionId being shown is ‘null’ because at the time, when this object was uploaded, there was no versioning being enabled for this bucket at all and hence it shows “null” versionId.

Let’s now test by uploading a new object (again an Image) to our S3 bucket. We would note here that, versionId has been automatically assigned to the object :-

Let’s now test by uploading a new object (with same name as that of already existing image) to our S3 bucket. Now that “VersionId” property has already been enabled, the latest file has a version associated with it, while the original file had a “null” version. Please note here that, With Versioning enabled, all previous versions of a given file are stored and kept safely in the S3 bucket.

Let’s now disable the “List Versions” property and delete this file, from S3-bucket :-

Easily, we can see that, file is gone from S3 bucket :-

Now, let’s enable the “List Versions” and we observe that, file is still present, but present with “Delete Marker”. It actually appears to the console-user that, file doesn’t exists anymore, but under the hoods, file still exists. Here, the “Delete Marker” itself is a type of this file and there exists a corresponding versionId associated with this deleted-file as well.

Next, note that, deleting the specific version of the file permanently deletes the file from S3. Let’s go ahead and delete the latest versionId of this file, which corresponds to the “Delete Marker”.

Next, we are deleting the particular version (which is a destructive operation, that can’t be undone), S3 would look to us for confirmation to permanently delete this specific version :-

Now, again files & their corresponding versions looks back to normal and file is again active and versionId corresponding to “Delete Marker” is gone permanently now.

Lastly, As of writing this blog, AWS S3 provides strong write-consistency model for objects/resources stored on the S3. It means that, any read-request for a given resource would surely fetch the latest version of that resource only. There is no additional cost or performance-penalty imposed on this behaviour.

Introduction to AWS S3 Encryption :- See here, the files that we upload to the AWS S3 are ultimately being stored somewhere on the AWS owned servers and to comply with your Organisational standards, you may want to make sure those uploaded files are not at all readable. In general, AWS doesn’t knows, which keys have been really used.

Approach no. 1 | Server-Side-Encryption (SSE) :- Here the encryption-keys are being managed and handled by AWS S3 itself. Object is going to be encrypted at server-side and that’s why this is called as SSE, acronym for “Server Side Encryption”. The algorithm used is AES-256. In order to get the uploaded objects encrypted at S3 through SSE, clients need to indicate the same while uploading objects to the AWS-S3. The same can be done by sending the Header ‘x-amz-server-side-encryption’ value to AES256, while invoking the ‘PutObject’ API. Please note here that, the data-key, with which(encryption is performed) is 100% managed by S3 itself.

Approach no. 2 | KMS based encryption :- Here the encryption-keys are being managed and handled by AWS KMS i.e. ‘Key Management Service’. Here, also encryption of objects happens at server side. In order to get the uploaded objects encrypted at S3 through KMS, clients need to set the Header ‘x-amz-server-side-encryption’ value to “aws:kms”. KMS provides user-control along with Audit-trail.

Approach no. 3 | Customer managed Key :- Here the encryption-keys are being managed and handled by the customer, completely outside of AWS. Here, Amazon S3 doesn’t stores the encryption-keys that we provide. Now, In order to transmit this data to AWS, we must use ‘https’ scheme, because we are going to send secret to AWS, therefore we must have encryption in transit. Encryption key must be provided through http headers, for every http request made. Once the file(i.e. Object) and the key both reaches to the AWS S3, server side encryption now happens, but using the client-specified-key. Now, even if we want to retrieve this file from S3, then also we would have to provide the client-managed-key. Overall, it adds to lot of management at client side. Please note here that, ‘HTTPS’ is mandatory with this approach.

Approach no. 4 | Client side Encryption :- Here, before uploading the object in AWS S3, we perform encryption. Some of the client libraries such as ‘Amazon S3 Encryption Client’ is a way to perform the client side encryption. Clients are solely responsible for decrypting the data as well, post downloading the data from AWS S3. Here, Customer/Client fully manages the keys and encryption-cycle.

Demonstration of AWS S3 Encryption :- We can either enable the encryption property at S3 bucket level itself OR we can also specify the encryption at the time of uploading a file.

Approach #1.) Specifying Encryption while File uploading time :-

Lets now upload the file again to the S3 and while uploading this file, we shall be specifying the encryption :-

We have to specify the storage-class for properties. Will keep this default as of now :-

We would specify the SSE-S3 type of encryption for this object.

Finally, the object(encrypted) is uploaded with a new versionId at S3.

Next, another option, as we saw above, we can also use the KMS based encryption, for performing the server-side-encryption with KMS managed keys. There are 3 options available here to use :- AWS managed KMS key, Custom AWS KMS keys, KMS key ARN. We are going to use : AWS managed KMS key.

Finally, we have the object(post uploading, this object gets encrypted with AWS managed KMS key) uploaded at the S3 bucket :-

Approach #2.) Specifying Encryption at Bucket level itself :-

We can go under properties of this S3 bucket and enable the default encryption property and also set the type of the encryption as well :-

Now, let’s upload any sample file(‘nature-1.jpeg’) again without specifying any type of encryption.

We shall observe that, uploaded file(that very specific versionId) gets encrypted with SSE-S3 algorithm :-

Introduction to AWS S3 Security :- We can have multiple security-policies :-

  • User based Security-Policy :- Our IAM user have the IAM policies and it dictates, whether our end-user have access to S3 bucket or not ?
  • Resource based Security-Policy :- We can have bucket wide rules from S3 console. It helps us to control, which principals can execute which all actions on the S3 bucket. It also allows us to perform cross-account actions as well on our S3 buckets. Apart from these, we can also have “Object access control list” and “Bucket access control list”.

JSON based policies :- This policy can be used in order to allow the access to both types of resources i.e. either Bucket OR Object level. This policy can be allowed to the principal level i.e. the account or user to which the policy has to be allowed. Example: In below example policy, Any object within the examplebucket S3 bucket, can be read publicly by everyone (as, principal is *).

Please note that, the API calls to S3 can also be logged in AWS Cloud-Trail and S3 access-logs can themselves be saved into an S3 bucket as well.

Demonstration of AWS S3 Bucket Policy :- For this example, let’s create a S3 policy, which would block uploading of any un-encrypted object onto S3 bucket i.e. we want every object to be encrypted by AWS Server side encryption approach, else we don’t allow the object upload to happen at all. Here, we shall be creating the S3 policy using the S3-Policy-Generator :-

First let’s choose the policy-type as “S3-Bucket-Policy”.

Next, we would keep the value of principal as *, since we want this policy applicable to everyone. Under Actions, we select the apiName as ‘PutObject’ since objects are uploaded on S3 using this very API only. We then set the ARN as `arn:aws:s3:::adityas-first-s3-bucket\*, because we want this policy to be applicable on all the resources uploaded onto this S3 bucket.

Next, we need to provide the condition, which is quite important. It finally means that, if the client didn’t supplied the value of this header ‘x-amz-server-side-encryption’ OR say client provided the value as null, then the aforesaid policy shall take-in effect.

Finally, we add this statement by clicking on the “Add-Statement” button. Next, we need to also provide another condition that, if the supplied value of this header ‘x-amz-server-side-encryption’ is not equal to ‘AES256’, then we deny the object to be uploaded from S3 (i.e. the aforesaid policy shall take in effect). Finally, we have this policy :-

Finally, we In Step #3, we generate the JSON policy.

We now copy this policy and paste it into the S3 policy placeholder.

Finally, we have this policy created for S3 :-

Now, if we try to upload any object to S3, without specifying the encryption as AWS-SSE, then we would get an error as shown below :-

Now, let’s try to upload any object to S3, along with specifying the encryption as AWS-SSE, then we would succeed :-

Preventing data-leaks from AWS S3 :- In order to prevent the data-leaks from AWS S3 to the outside world, we usually implement the permissions by “Blocking the Public Access” at the bucket-level. Until & unless, we are not serving websites from S3, we would NOT want the objects to be accessed publicly.

We can also specify the aforesaid block-access setting at AWS account itself as well. This would block the public access on all objects across all S3 buckets into this account.

Next, we also have something called as ACLs i.e. Access-Control-List, using which we can control the way each object(i.e. each resource) is being treated inside the S3 bucket. With below permissions, only my current AWS account can READ/WRITE this object :-

Demonstrating AWS S3 powered websites :- AWS S3 can host the static websites and we can have them accessible world wide web. In order to allow the website to be publicly accessible, we need to make sure that some policy is not denying the access of S3 to public.

Let’s create following files and upload them to our S3 bucket. Please note that, we have disabled the JSON policy (which we created above) and thus, we can now even upload the resources without the SSE encryption as well :-

First, let’s create the file : “index.html” and upload it to our S3 bucket.

Next, let’s create the file : “error.html” and upload it to our S3 bucket.

Next, we enable the static website hosting property for this S3 bucket. We also specify the index-document and error-document.

That’s it and we are done. We are now able to see the endpoint for our static website hosted through AWS S3 :-

Lets access the static S3 hosted website by accessing the aforementioned url and as expected, we get 403 error, because we have blocked the public access :-

Let’s now disable the public-access on this AWS S3 bucket.

Next, let’s create the JSON based policy using PolicyGenerator, to allow the public access of the objects/resources in the S3 bucket through “GetObject” APIs.

Let’s paste the aforesaid policy text into the bucket policy and save it.

Now, let’s try to access this statically hosted website through it’s endpoint :-

Next, when we try to access some other file through above url, we would get the error message, as we mentioned in error file :-

Cross Origin Resource Sharing :- An Origin is a combination of scheme(protocol), host(domain) and port. This is a web-browser based mechanism to allow requests to other origins while visiting the main origin. This is a browser based security measure.

  • Same-Origins :- Say for example, request from browser from website-1 (http://www.example.com/app1) to the same origin another link (http://www.example.com/app2).
  • Different-Origins :- Say for example, browser makes a request from website-1 (http://www.example.com) to the different origin link (http://www.other.example.com).

Web-browser would block this access, unless there is a correct CORS header. Thus, the requests wouldn’t be fulfilled unless the other origin allows for the requests, using CORS Headers (ex: Access-Control-Allow-Origin). Lets take an example below :-

  • Say web-browser made a request to the main-origin (https://www.example.com), which in-turn would be accessing the another origin site (https://www.other.com).
  • Next, web-browser would make a pre-flight request and in this request, it would ask to the cross-origin that, whether the access is allowed or not from the main-origin (www.example.com) ?
  • Now, the cross-origin responds back by mentioning (In CORS Headers), whether “Access-Control-Allow-Origin” is allowed or not and which “Access-Control-Allow-Methods” are allowed ? This is what this Cross-origin is allowing this browser to do.
  • Next, browser would issue a request to the cross-origin url.

From S3 prospective, let’s see the same-origin-access of different files :-

  • Say we have a S3 bucket, which is enabled as a static website. We hit to the static-site hosted on this S3-bucket. The website would return back with ‘index.html’ file.
  • Next, say this ‘index.html’ file suggests that, get a different file from same origin (same S3 bucket). Since the other file is also present on the same origin (i.e. same S3 bucket), therefore the file can be accessed well.

Let’s now create the following files “index.html” and “extra-page.html” and upload these on the S3 bucket :-

Upon accessing the URL of website, it would be accessed successfully, because both of the files are present on the same origin i.e. same S3 bucket.

Also, Upon accessing the URL of another file directly, same shall also be accessible :

Next, let’s delete the file named “extra-page.html” from our first bucket and again try to access the static website. We would note that, since the another referenced file doesn’t exists anymore on this S3 bucket, hence there comes an error on this part.

From S3 prospective, let’s see the cross-origin-access of different files :-

  • Say we have a S3 bucket, which is enabled as a static website. We hit to the static-site hosted on this S3-bucket. The website would return back with ‘index.html’ file.
  • Next, say this ‘index.html’ file suggests that, get a different file from different origin (i.e. different S3 bucket). If the other bucket is configured with right CORS headers, then the web-browser would be able to make the request. If not, browser would not be able to make the requests for other files present in other S3 bucket.

Let’s now create the new S3 bucket. This would have all public access and policy to allow access of S3 ‘GetObject’ API call. Onto this S3 bucket, we upload the “extra-page.html” file onto this bucket.

Now, upon accessing the “extra-page.html” file through the website url would look something like this. Please note that, our second bucket is powering to this second static website and there is only one file present onto this second bucket with name of “extra-page.html”.

Next, now create the following file “index.html” & upload the same onto our first S3 bucket :-

Upon accessing the URL of website (i.e. through default index.html file), it would be accessed successfully, but with CORS error, as shown below. The reason for the same is because, our browser is currently sitting on the domain1 (http://adityas-first-s3-bucket.s3-website.ap-south-1.amazonaws.com/) and its trying to access the resource lying on the another-domain (http://adityas-second-s3-bucket.s3-website.ap-south-1.amazonaws.com/) && there is no CORS header ‘Access-Control-Allow-Origin’ is present on the accessed resource i.e. (http://adityas-second-s3-bucket.s3-website.ap-south-1.amazonaws.com/)

So, Now lets proceed and set up the CORS header on the second S3 bucket, so as to allow the first bucket’s origin to make a request to this second bucket. This would allow the CORS :-

Upon accessing the URL of website hosted on the first S3 bucket (i.e. through default index.html file), there shall be no issue of CORS now. Please note here that, the “extra-page.html” file here has been loaded from the another origin i.e. another S3 bucket. Same can be verified from the response headers of this file. We see below, an header “Access-Control-Allow-Origin” is coming in response from the second bucket side itself (i.e. when the resource from second bucket is accessed).

Introduction to AWS CLI :- Let’s now configure the AWS CLI at local computer :-

We now configure the AWS using access-key & secret-key :-

Next, we can list down all buckets into our account :-

Next, we can list down all resources into any particular bucket by using following command :-

Next, we can copy the files present in the S3 bucket to the local machine :-

Next, let’s create a new S3 bucket into our AWS account using CLI :-

Next, let’s delete the empty S3 bucket from our AWS account using CLI :-

Thats all for this blog. We will meet again in next part of this series of blogs on AWS developer expertise.

References :

--

--

aditya goel
aditya goel

Written by aditya goel

Software Engineer for Big Data distributed systems

No responses yet