Downloading files from Databricks’ DBFS

Guilherme Banhudo
3 min readJan 11, 2023

--

A quick tutorial on how to access your DBFS instance to download files solely via your browser.

Photo by Erol Ahmed on Unsplash

More often than not, you may be interested in downloading data from your Databricks instance. And whilst Databricks provides a UI for retrieving your DataFrame result, sometimes you are interested in generating data from your Databricks instance not directly related to DataFrames. Typical use cases involve simulation results, generating textual data, or even storing your DataFrame schema.

The Databricks UI for downloading DataFrame results

By default, Databricks does not provide a way to remotely access/download the files within DBFS. In this quick guide, I’ll show you how to access your DBFS data in 2 minutes without any external tools, relying simply, on your browser.

1. Storing our output into a file in DBFS

Consider taking a DataFrame schema into a text file so you can process it overcoming Databricks’ cell output:

base_data: DataFrame = spark.read.json([…])
base_schema: str = str(base_data.schema)

Start by writing a file to DBFS we want to download:

dbutils.fs.put("/dbfs/FileStore/schema_output.txt", base_schema)

Note: This is important to place the file under the dbfs/FileStore/{your_path} file path, the reasoning behind it will be further explored in the second step

2. Downloading the file from DBFS

Databricks does not allow downloading data directly via the DBFS Data UI widget, however, the data within the FileStore folder is exposed via endpoint, and that is exactly how we will access our file.

The Databricks Data > DBFS management widget

2.1 Fetch our Databricks tenant instance URL

Retrieve your Databricks tenant instance URL by accessing the Databricks platform within your Cloud provider. For the sake of this tutorial, we will do so using Azure, however, keep in mind the process is similar to all providers.

Considering the following case, we are interested in two portions:

  • The instance address in blue
  • (Optional) The o parameter in red, alongside the tenant’s ID in blue
https://adb-12345.11.azuredatabricks.net/?o=12345#notebook/9999111/command/1111

2.2 Create your GETG request to the file system endpoint

The files endpoint makes the information within the FileStore folder available for access via a GET request — or simply, by accessing the URL via your browser.

Note: Again keep in mind, the data must reside within the FileStore folder or its subfolder, as long as the parent is the FileStore.

In step 1 we stored our file in the path:

/dbfs/FileStore/schema_output.txt

Hence to access the file, we insert the path directly into the URL, replacing the /dbfs/FileStore with files:

https://adb-12345.11.azuredatabricks.net/files/schema_output.txt?o=12345

Similarly, should we have stored our file in the path:

/dbfs/FileStore/schema/schema_output.txt

We could access the file via the URL:

https://adb-12345.11.azuredatabricks.net/files/schema/schema_output.txt?o=12345

Troubleshooting

Incorrect path error

One of the most frustrating errors — and the most cryptic — is the incorrect path error. Make sure you don’t include the FileStore folder in the access path.

HTTP ERROR: 404

Problem accessing /files/FileStore/schema_output.txt. Reason:
Bad Target: GET FileStore/schema_output.txt

--

--

Responses (1)