Deep dive into S3 for devs | Part5

aditya goel
21 min readNov 6, 2021

--

In case, you are landing here directly, it would be recommended to visit this page. In this section, we would deep dive to learn about interacting with AWS-S3 from Java and Python.

First Step:- Let’s first create a new S3 bucket into our own AWS account. Note that, we are creating our bucket in “us-east-2” AWS region.

Second Step:-We would also specify the tags, in order to remember the reason for this bucket.

Third Step:-Next, we shall be creating an access policy :-

So, basically we have given three permissions to out this policy :-

Next, we would be associating the ARN of our S3 bucket with this policy :-

And finally, we have got our new customised policy got created. We shall be using this policy, in order to attach it to our new IAM user that we shall be creating further in this blog.

Fourth Step:-Next, let’s create a fresh new IAM user, to which we would only be allowing the programmatic-access.

We would be attaching the newly created policy with this user.

Let’s also attach the tags, for future purpose, so as we can recall. Note that, we can put anything in the tags.

Finally, our new IAM user has been created now. Let’s go ahead and download the access-key & secret-key.

Note that, above IAM user only have limited access to the particular S3 bucket.

Details about our S3 configuration buckets :

Versioning → Versioning allows us to store historical instances of our binary object so that we can always go back in time to a previous version. Of course, this increases our storage cost, but if we need the history, having it there is very powerful.

Server Access Logging → This provides detailed access logs of a bucket. This detailed logging can also provide insights on our data access patterns, (both from compliance and customer standpoint), that may help us to provide a better solution for our customers.

Static Website Hosting → It allows us to build a simple, static website using an S3 bucket as a store. Now, this isn't going to give us anything sophisticated from a web perspective, but if we need just a simple landing page or a couple very simple options, this might be a great one to explore.

Object-Level Logging → This uses cloud trail, which means that there's going to be some significant cost associated with it. But we can log all of our access and behaviour of an object from within this option.

Default Encryption → There are times when you want to store our data in an encrypted fashion, and actually, that's more often the case than not. Many times in a business environment, we shall have requirements that any data we store publicly or in a public infrastructure that we don't control, it must be encrypted. And this gives us the ability to do that, if we choose to.

Object Lock → It allows us to lock objects from deletion. Say, for instance, we publish a newsletter using S3 and never want them deleted. This can be a great option to save you from bad code or even typos when dealing with objects in our S3 bucket. Do note, however, object lock is only available at create time and not afterwards, at least in the current incantation of Amazon S3.

Tags → We've already talked about tags, and these are very powerful when filtering data through the AWS API.

Transfer Acceleration → It’s an another one of those features that can cost us a little bit of money but may be really powerful. Transfer acceleration uses CloudFront behind the scenes to provide a CDN for our data. We're not going to use that in this blog, but do know that it's there if we find that, we have data stored in the United States, for instance, and a whole bunch of users in Africa or Asia or Europe that may need access to that data, we may find this is a better option for our use cases than just simply replicating our data.

Event Notification → The events option allows us to get events when things happen within our system say if people upload data or do other things. This can be really good if we have got an automated system that's supposed to produce a file, let's say, every day, we may want to set up a notification to make sure that we got that file, and if we don't, go look at what's going on.

Requestor Pays → The final option we're going to talk about here is this Requester-Pays. This is a very interesting use case, in which the requester of the file pays through their AWS account the download fee, as opposed to us paying it, the owner of this bucket. The only time I've ever seen this used is with a security firm that would generate these massive reports, but didn't want to pay for users to get access to them, so they put them in a bucket, used Requester-Pays, and then those users had to download them that way so that users were paying the transfer costs.

Permissions Tab → Under this tab, we have all of this public access that we talked about earlier. We also have the ability to create access control lists, and these can be really critical if you only want certain applications or certain systems to access your bucket. There's bucket policy that you can write if you want to create your own, and then there's also CORS configuration if you need to modify that to download files from a separate system.

Management Tab → This tab too, has some really cool tools in it as well. Lifecycle is very powerful, and I've used lifecycle to expire old objects, which is one such option. You can also use lifecycle to take data that's old and instead of purging it, move it to a cheaper storage solution for more long-term storage.

Replication Rules → It gives us the ability to replicate our data across regions or within the same region based on the needs of our system. Now, analytics, metrics, and inventory provide us more insight into the data stored within our bucket. And access points provide robust network security within our S3 bucket. We're not going to mess with any of these at all. Just know that they're there, and each one of them has a learn more option that you can look at to get a lot more information than I just presented here.

Introduction to AWS CLI :- In this section, we would introduce AWS CLI for interaction with S3 :-

Step #1.) Configuring AWS CLI :- Let’s now configure the AWS CLI at local computer. We are first logging into our AWS account with our root-creds (i.e. Root Access-Key & Root Secret-Key) :-

Step #2.) Next, we go and execute a command to check, whether our AWS got configured successfully or not ?

Step #3.) Next, Let’s go ahead and create a new bucket from the command-line. Command “mb” stands for “Make-Bucket” :-

Let’s verify the bucket, that we just created from the AWS console. Note that, In this newly created bucket, objects can be with public access :-

Step #4.) Next, Let’s go ahead and delete the bucket, that we just created. Command “rb” stands for “Remove-Bucket” :-

Step #5.) Next, Let’s go ahead and upload some file to our earlier S3 bucket through command-line :-

Here is, we can verify from AWS console that, our object has duly reached to the AWS S3 bucket or not. Note, that file named “environment.jpeg” has been uploaded well here @ S3 bucket.

Step #6.) Next, Let’s go ahead and move the files from our local-directory to the bucket. Command “mv” stands for “Move-Object”. This command would move the file from our local directory to S3 bucket and same is evident that, file is not there anymore in our local-directory.

Step #7.) Next, Let’s go ahead and delete the files from our S3 bucket. Command “rm” stands for “Remove-Object” :-

Note that, if we again execute the aforesaid command, it would again execute without any issues. That’s a caveat here.

Step #8.) Next, Let’s go ahead and move the files from our S3 bucket to our local-directory. Command “mv” stands for “Move-Object” :-

Let’s also verify that, this file (environment2.jpeg) doesn’t exists anymore there @ S3 bucket.

Step #9.) Next, Let’s go ahead and move the files from our S3 bucket to our local-directory. Command “mv” stands for “Move-Object” :- This command actually moves the files from the S3 bucket (i.e. removes the file from there) to our local directory.

Step #10.) Next, Let’s go ahead and list all files present in our S3 bucket. Command “ls” stands for “List-Objects” :-

Step #11.) Next, Let’s go ahead and perform a batch-copy-operation from our local-directory to the S3 bucket. Command “sync” stands for “Sync-All-Objects” :-

First of all, note that, we have following 2 files, present into our local-directory, to which we want to batch-copy to the S3 bucket.

Next, perform the sync operation on this local-directory, post which all files present into this directory shall reach to the AWS S3 Bucket.

The same can be verified from the S3 bucket console . Both of the aforementioned files, can now be viewed at the S3 console as below :-

Step #12.) Next, Let’s visit a new use-case of batch-copy-operation from our local-directory to the S3 bucket. Command “sync” stands for “Sync-All-Objects” :-

First of all, note that, we have created a new file(“responder.json”) in this local-directory and also modified the contents of the existing file “requestor.json” :-

Next, perform the sync operation again on this local-directory, post which (new+old) files present into this directory shall reach to the AWS S3 Bucket. Note that, we are intending to Sync from Local-directory to the S3-bucket. Though, we can also perform the reverse-sync too i.e. from S3-bucket to the local-directory too.

Let’s verify the same from the AWS portal too :-

Step #13.) Next, Let’s now visit yet another use-case of batch-copy-operation from our S3-bucket to the local-directory. Command “sync” shall be used as above :-

First of all, we go ahead and delete the file(“responder.json”) from our S3 bucket :-

Also note that, as of now we have following files in our local-directory :-

Next, perform the sync operation again from this S3-Bucket to the local-directory :-

Post above operation, we expect the file(which are present in the S3 bucket) i.e. “1624590332699.jpeg” should be copied to our local-directory, but note that the file which we have deleted(“responder.json”) from the S3-Bucket, has not at all been impacted in our local-directory :-

Next, say if we want that all deleted files(from S3 bucket) should also be synced to the local-directory, then we should be using the additional flag “ — delete” :-

Step #14.) Next, Let’s visit a new use-case of batch-copy-operation from our Source-S3-Bucket(“aditya-bucket-demo-1”) to the Target-S3-Bucket(“aditya-bucket-demo-2”). Command “sync” shall be used for this purpose too :-

Let’s verify from the S3 console following things :-

  • Whether Create-Bucket (for bucket : “aditya-bucket-demo-2”) operation is succesful or not ?
  • Whether Sync-Operation from S3-Bucket “aditya-bucket-demo-1” to S3-Bucket “aditya-bucket-demo-2” is succesful or not ?

Step #15.) Next, Let’s go ahead and perform the cleanup operation. We shall first be deleting all the files recursively. We would be using the “rm” command :-

We shall next be deleting the 2nd bucket(“aditya-bucket-demo-2”) that we just created. We would be using the “rb” command :-

Let’s now verify from the S3 console that, 2nd bucket(“aditya-bucket-demo-2”) no-more exists :-

Step #16.) Pre-signed-URLs :- We can also generate time-bound pre-signed urls for enabling access to objects in private S3 buckets :- Note here that, we have enabled the access to one particular object for 30 seconds :-

We can verify from our browser that, this pre-signed url is duly accessible :-

Post 30 seconds time-period, we can observe that this pre-signed url is no more valid :-

Introduction to BOTO3 library to interact with S3:- In this section, we would introduce AWS CLI for interaction with S3 :-

Step #1.) Configuring Boto3 library :- Let’s now configure the AWS BOTO3 library within our Jupyter notebook, in order to communicate with S3 from our machine :-

Step #2.) Importing necessary packages:- Let’s now import required packages :-

Step #3.) Initialising Boto3 Client :- Let’s now initialise the S3-client :-

Step #4.) Creating Bucket with Boto3 :- Let’s now create a brand-new S3 bucket with the help of boto3 client:-

Let’s now verify that, whether the bucket got created or not :- Note the 2nd row in snapshot below, thats the new bucket that we have created :-

Step #5.) Retrieve list of Buckets with Boto3 :- Let’s now retrieve the list of all buckets that we have into our S3 bucket:-

Let’s now go ahead and work with files using Boto3 library :-

Step #1.) Upload files to our S3 Bucket with Boto3 :- Let’s now upload some files to our S3 bucket. Note here that :-

  • The first parameter indicates the path of the source-file, which has to be uploaded.
  • The second parameter indicates the name with which we want this file to be uploaded into S3 Bucket.

Let’s verify from our S3 bucket that, whether this particular file really got uploaded or not and here it is :-

Step #2.) Download files from our S3 Bucket to our local-directory with Boto3 :- Firstly, here is snapshot from our current directory regarding all the files being present here :-

Next, we go ahead and download the files from S3 bucket to our local-directory :-

  • The first parameter indicates the name of the name of the file(object) to be downloaded from the S3 bucket.
  • The second parameter indicates the actual-path along with the name of the to-be-downloaded-file.

We can now verify that, file with this name do exists now :-

Step #3.) Fetch List of files from S3 Bucket using Boto3 :- Let’s go ahead and list-out all the files present at our S3 bucket :- As of executing this query, we have only 1 file being present at S3 :-

Now, using boto3, let’s list down all the files being present with us at S3 bucket :-

As of this command being executed, we have only 1 file being present at S3 bucket and the same is being returned in response as well :-

Step #4.) Copy files from one S3 Bucket to another using Boto3 :- Let’s now explore the option of copying files from one S3 to another :- First we create a new S3 bucket and then we perform the copying operation on the destination-bucket from our source bucket.

Step #5.) Delete list of files from our S3 Bucket using Boto3 :- Let’s go ahead and delete the list of objects from the S3 bucket :-

We can now verify the same from our S3 Bucket that, deleted object no-more exists in S3-Bucket :-

Step #6.) Let’s now go ahead and set some permissions-restrictions while creating buckets :-

First, by default, observe that public access to S3 Buckets is OFF by-default i.e. all S3 buckets are accessible publicly.

Second, let’s now set the permissions, while creating the bucket :-

Lastly, let’s observe that, S3 bucket’s public-access-permissions are all BLOCKED i.e. this bucket doesn’t allows public access.

Step #7.) Let’s now go ahead and generate pre-signed-urls for the particular object from S3 bucket using Boto3 :-

Now, as soon as we access the file using this pre-signed url, the file would be downloaded. Also note that, this link shall be available only for 30 seconds.

Step #8.) Let’s now go ahead and delete the buckets from our AWS account :- First let’s observe that we have 4 buckets with us in our AWS account :-

Now, let’s go ahead and delete the bucket(‘aditya-bucket-demo-4’) using boto3 library :-

Lastly, let’s again observe that we are now left with following buckets with us in our AWS account :-

It’s essential for bucket to be empty, at the time of deleting a bucket and same can be done using below approach :-

Introduction to Java SDK library to interact with S3:- In this section, we would introduce AWS SDK for Java Language for interaction with S3 :-

Step #1.) Let’s first setup a simple maven project @ our local-machine. The name of our project is “JavaBasedS3” :-

Step #2.) Let’s now setup a template for logging pattern :-

Step #3.) Let’s create a new bucket using S3 SDK library :- For this purpose, we first do setup :-

Now, let’s go ahead and createBucket using this code-snippet :-

Same can be verified from AWS dashboard :- We have our new bucket created thus :-

Step #4.) Let’s now upload some file into this new bucket using S3 SDK library :- First step here are some pointers as we define :-

Next, here is how the code for uploadFile looks like :-

As a next step, let’s invoke aforesaid code in order to upload the file :-

Finally, here is we van verify that, following files have been well uploaded to S3 :-

Step #5.) Let’s now download some file from our new bucket using S3 SDK library to our local-directory :- First step let’s verify that, we don’t have any file present into the download directory :-

Next, here are some pointers as we define :-

Next, here is how the code for downloadFile looks like :-

As a next step, let’s invoke aforesaid code in order to upload the file :-

Post executing the aforesaid code, here is we can verify the files thus downloaded from S3 with specified names :-

Step #6.) Let’s now delete some file from our new bucket using S3 SDK library to our local-directory :- First step let’s define the code for doing the same :-

As a next step, let’s invoke aforesaid code in order to upload the file :-

Post executing the aforesaid code, here is we can verify the files from AWS dashboard that, it doesn’t exists anymore :-

Step #7.) Let’s now copy some file from our new bucket to older bucket using S3 SDK library. First step, let’s see that our target-bucket(‘aditya-bucket-demo-11’) is empty right now :-

Here are some pointers :-

Next, here is how the code to copy the object from one bucket to another bucket, looks like :-

Next step, we would be creating a new bucket(“aditya-bucket-demo-12”), uploading some file to this new bucket and from there, we would then copy the file to the target-bucket :-

Now, we would verify our artefacts in the new bucket (“aditya-bucket-demo-12”) :-

Also, here is the look of artefacts from our target-bucket(“aditya-bucket-demo-11”):-

Step #8.) Let’s now list down all the files we have in some specified bucket :- Here is how the code looks for the same :-

Let’s go ahead and now invoke this method :-

Step #9.) Let’s now modify the permissions for his bucket :- Note hat, by default, the public-access to the bucket is ON i.e. anyone can publicly access the bucket.

Next, lets write the code to block the public-access :-

Lastly, let’s go ahead and invoke the above code-piece :-

Lastly, let’s verify from our AWS dashboard, whether public-access-permissions did go changed for this bucket or not :-

Step #10.) Let’s now generate the pre-signed url for temporary access to the object :- First and foremost, here is the basic suff required :-

Next, we would create the method to create pre-signed url :-

Now, go ahead and invoke he above method :-

Lastly, let’s verify whether the preSignedURL is active and accessible :-

Now, go ahead and again verify, post 30 seconds, whether this url is accessible anymore :- It won’t be, because we did specified the validity of the url only for 30 seconds.

Step #11.) Let’s now go ahead and delete the bucket :- First of all, let’s note that, we have one object being present into the bucket :-

As a rule of thumb, it’s essential that there should be NO object lying in the bucket, before deleting the bucket itself, so therefore let’s go ahead and delete all objects first and then the bucket itself :-

Let’s go ahead and invoke the aforementioned method :-

Lastly, let’s verify from the AWS dashboard that, this bucket no-more exists :-

Thanks for reading through this bog. That’s all for now and we would see you in next series.

References :-

--

--

aditya goel
aditya goel

Written by aditya goel

Software Engineer for Big Data distributed systems