Using gcsfuse to mount GCS bucket
Migration to Cloud
As your applications scale it becoomes tedious, expensive and error prone to have it all on your data centers. Modern microservices are built with the philosophy of isolation. Scaling up and down of those services should happen with a snap of a finger (enter containerization technologies like dockers replacing traditional VMs), you scale up to serve bigger traffic and also to make it fault tolerant. But even if you have 20 docker pods, a failure in data center as simple as switch failure can bring your whole infrastructure to its knees. Thus the adoption of cloud. (There are numerous other reasons as well.)
In cloud everything becomes a service. Computation is a service, storage is a service and so on. While migrating traditional applications to cloud we need to gracefully preserve the integrity of the applications’ storage, compute and other components.
Google Cloud Storage
GCS is the storage ‘cloud service’ provided by google. Unlike traditional NFS you organize your data into buckets (logical entities) and the ‘cloud’ handles all other underlying infrastructure. Your application can read from or write to buckets using standard APIs. For an application which heavily depends on file read/write this can quickly become tedious. Also when you are moving to cloud from traditional environments there is a high chance that there are numerous applications which share the common storage and their subsequent operations depend on that.
GCS as Traditional NFS
What if we could have those storage buckets abstracted as traditional NFS? Applications would continue reading/writing with a now found easy abstraction that it already knows. Well, there’s a tool called gcsfuse
which does exactly this.
In this article I will quickly create a sample storage bucket, add some files to it, mount this bucket on my local linux machine and open those files (read/write). For my local machine it wouldn’t make a difference if they were hosted on some faraway zone.
Creating a gcp project with gcs and storage buckets
Loginto console.cloud.google.com and create a project
Create bucket under the tab ‘storage’ (That’s google cloud storage) [Note down the id as this will be useful later on]
Create some folders/files in it
Local gcloud setup
gcloud is a CLI based tool to interact with the google cloud services. Practically in most cases you end up using GUI/web tools to interact but for the purpose of this demo, I am using gcloud CLI tool.
Install
gcloud
Installation guideAuthenticate
gcloud auth application-default login
This launches your default browser to authenticate. On successful authentication credentials are stored at:
~/.config/gcloud/application_default_credentials.json
You can get your proper credential file and store in this location if you are in a non GUI environment
Installation of gcsfuse
tldr; (debian)
export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -
sudo apt-get update
sudo apt-get install gcsfuse
mounting
Create a folder where you want to mount and cd to it
Run
gcsfuse --implicit-dirs dentai-bucket1 .
(dentai-bucket1 is my bucket id)If your gcloud authentication was successful and bucket id is proper, you should see something like:
Using mount point: /home/sudipbhandari/sudip/vats
Opening GCS connection...
Opening bucket...
Mounting file system...
File system has been successfully mounted.
If you
ls
you won’t actually see anthing at this moment even if you have some stuff in your bucket because the host system yet doesn’t have those objects. However, if you just create the directory (mkdir) and cd into it, it will actually sync with your bucket and the contents start to appear. –implicit-dirsEmpty
ls
output
~/sudip/vats » ls sudipbhandari@sudipbhandari-Latitude-5480
---------------
- Creation of images folder (this actually exists in my bucket with real contents but locally I am just creating the directory and nothing more)
~/sudip/vats » mkdir images sudipbhandari@sudipbhandari-Latitude-5480
----------------------------------------------------------------------------------------------------------------------------------------------
~/sudip/vats » cd images sudipbhandari@sudipbhandari-Latitude-5480
----------------------------------------------------------------------------------------------------------------------------------------------
~/sudip/vats/images » ls sudipbhandari@sudipbhandari-Latitude-5480
sanjayyriit yella016
Now gcsfuse has abstracted the remote gcs bucket as if it was my local NFS point. Neat!!
- I can now write to this directory and copy files. These will be synced to my bucket on the cloud.
~/sudip/vats » cp ~/Downloads/startupofyou.mobi .
On the cloud
Performance and other considerations
Writing files to and reading files from GCS has a much higher latency than using a local file system. If you are reading or writing one small file at a time, this may cause you to achieve a low throughput to or from GCS. If you want high throughput, you will need to either use larger files to smooth across latency hiccups or read/write multiple files at a time.