Skip to main content

Download Data to instances

You can download data to your GPU/CPU powered instances in multiple ways. Lets looks at some options.

  • Wget (Simple and fast)
  • Using Curl Wget
  • Downloading from Kaggle

Downloading using wget

Open Terminal from JupyterLab, and use wget + link to download any publicly available datasets.

# Downloading CIFAR10 dataset

Downloading using CurlWget

Steps to download data from Kaggle Dataset, Google Drive or from other data sources to instance using CurlWget extension.

Downloading data from Kaggle

Kaggle provides an API which helps you to

  • Download Kaggle datasets
  • Upload to Kaggle datasets - Useful, when you want to upload your model
  • Make Kaggle submissions

Interested in learning more about Kaggle API check the docs here.

Install Kaggle API

Open terminal.

pip install kaggle --upgrade

Setup Kaggle API

In Kaggle page go to the Account tab of your user profile and select Create API Token. This will trigger the download of kaggle.json, a file containing your API credentials. Place this file in the location ~/.kaggle/kaggle.json.

If you have uploaded using JupyterLab, use the below commands to copy it to the required location.

mkdir .kaggle
mv kaggle.json .kaggle/
chmod 600 ~/.kaggle/kaggle.json

Downloading Kaggle dataset

Open terminal, and replace feedback-prize-2021 with the competition you are participating.

kaggle competitions download -c feedback-prize-2021

Uploading Kaggle dataset

You can use Kaggle API to upload datasets from instance to Kaggle datasets.

kaggle datasets init -p /path/to/dataset

You will find dataset-metadata.json file inside the dataset folder. Change the 'id' and the 'title' in the file dataset-metadata.json

kaggle datasets create -p /path/to/dataset

Update an existing dataset

kaggle datasets version -p /path/to/dataset -m "Updated data"

Uploading to GDrive/Google drive

Upload files to Google Drive from instance terminal

  1. Download the latest release of gdrive by using wget form below link.
  1. Unzip the archive using the following command.
tar -xvf gdrive_2.1.1_linux_386.tar.gz
  1. Execute the command
./gdrive about
  1. Click the authentication URL shown in the terminal and choose your gdrive account. Once you authenticate the app, copy the generated auth token and paste it in the teminal.

  2. Now you can upload files to google drive using below command.

./gdrive upload  /home/{file path}