We use cookies to ensure that we give you the best experience on our website. By continuing to use the website you agree for the use of cookies for better website performance and a personalized experience.

Real-Time Business Insights with Apache Druid and Apache Superset

Beata Zawiślak
.
June 26, 2025
Real-Time Business Insights with Apache Druid and Apache Superset
Beata Zawiślak
June 26, 2025
.
X MIN Read
June 26, 2025
.
X MIN Read
June 26, 2025
.
X MIN Read

Apache Druid is a powerful, high-performance real-time analytics database that can store valuable company data. However, the lack of built-in tools for data visualization makes it challenging for business users to extract actionable insights directly from the raw data.

Fortunately, Apache Superset – an open-source data visualization and exploration platform – seamlessly integrates with SQL-based databases, including Apache Druid. By combining Druid's fast, scalable data engine with Superset's intuitive visualization capabilities, data and business analysts can more easily uncover insights and drive value from the data.

Below is a simple step-by-step guide to help you connect your Apache Druid instance with Apache Superset and build a sample dashboard. We hope this tutorial proves helpful and empowers your analytics team to create insightful dashboards – ultimately enabling smarter, data-driven business decisions.

Step-by-Step: Connecting Apache Druid to Apache Superset

Starting Apache Superset

1. Create a dedicated folder for a demo:

mkdir demo
cd demo

2. Clone the Superset repository:

git clone --depth=1 https://github.com/apache/superset.git

3. Go into the main dictionary:

cd superset

4. Add required Apache Druid drivers:

echo "pydruid" >> ./docker/requirements-local.txt

5. Run the Superset application:

docker-compose build --force-rm
docker-compose up

6. Check the status of the containers:

docker ps

Starting Apache Druid

1. Download the latest Druid version from the official website. For this tutorial we are using Apache Druid 32.0.0 release:

wget https://archive.apache.org/dist/druid/32.0.0/apache-druid-32.0.0-bin.tar.gz

2. Extract the binaries:

tar -xzf apache-druid-32.0.0-bin.tar.gz

3. Configure Ports (If Needed)

By default, Apache Druid uses some ports that may conflict with Apache Superset, particularly port 8081, which Superset may use for proxying. If you encounter any issues, follow the steps below to adjust the configuration:

A. Change the Coordinator Port (Default: 8081)

Open the following file:

/apache-druid-32.0.0/conf/druid/auto/coordinator-overlord/runtime.properties

Modify line 21 as follows:

druid.plaintextPort=8073

B. Change Jetty AdminServer Port (Default: 8080)

Apache ZooKeeper’s embedded AdminServer binds to port 8080 by default. If it conflicts with other services or causes errors, change the port in:

/apache-druid-32.0.0/conf/zk/zoo.cfg

Rewrite the default port to a different one by adding the below line into the configuration, for example:

admin.serverPort=8077

C. Skip Port Checks (Optional)

To bypass port availability checks when using custom ports, set the following environment variable before starting Druid:

export DRUID_SKIP_PORT_CHECK=1

4. Start Apache Druid Locally

Navigate to the extracted directory and start Druid:

cd apache-druid-32.0.0
./bin/start-druid

Then open your browser and go to:

http://localhost:8888/

This will open the Apache Druid console. If you see the UI, Druid is running successfully:

5. Load a Sample Dataset

To ingest a sample dataset (such as the Wikipedia example) for use with Apache Superset, follow the instructions in the Apache Druid Quickstart (local) tutorial

Connecting Druid to Superset

1. Go to http://localhost:8088/ in your browser:

2. Use default credentials to log in:

login: admin

password: admin

3. Set up the connection between Apache Superset and Apache Druid:

4. Choose Apache Druid database:

5. Provide a valid SQLAlchemy URI to successfully establish the connection

The general connection URL for open-source Apache Druid is:

druid://<broker_ip_address>:8082/druid/v2/sql

In our case, since Apache Superset is running in Docker and Apache Druid is running on the local host, you need to use 'host.docker.internal' as the hostname:

SQLAlchemi URI: druid://host.docker.internal:8082/druid/v2/sql

In the bottom right corner, you should see the following message that the connection is set up correctly:

6. You can always click the settings icon in the top-right corner of the screen to view your current database connections:

Querying Druid from Superset

1. Go to SQL Lab to query Apache Druid from Apache Superset. You can access it either from the menu where the database connection was made or by clicking the SQL card at the top to go directly to SQL Lab:

2. Choose the Apache Druid database to query:

3. Run example queries as follows:

SELECT *
FROM wikipedia;

As you may notice, the default record limit is set to 1,000. You can easily adjust this limit using the input field next to the Run button.

Superset also displays the query execution time, allows you to download results as a CSV file, and provides a query history so you can revisit previous queries.

SELECT channel, COUNT(*) AS "Rows per channel"
FROM wikipedia
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;

SQL Lab also allows you to create a Chart directly from that view. These charts can then be used while building the Dashboard.

Note: To create the chart, the SQL query must not include any aliases. Use the query below if you want to create a chart directly from this panel:

SELECT channel, COUNT(*)
FROM wikipedia
GROUP BY 1
ORDER BY 2 DESC;

Creating a Chart for a Chosen Dataset from Druid in Superset

You can create a chart from an Apache Druid dataset in two different ways:

A. From SQL Lab (Query-Based Chart)

Click the Create Chart button available in SQL Lab beneath the executed query:

You should see a chart creation window like the one below:

When you create a chart from a selected query in the SQL Lab section, the chart's data source is set to that specific query.

This means:

  • You cannot use additional dimensions or records beyond what is defined in the query
  • While you can apply Filters, Ordering, and Row limit, you won't have access to other dimensions available in the original dataset

B. From the Charts Menu (Dataset-Based Chart)

Go to Charts from the main menu and click the + Chart button on the right:

Before creating your first chart this way, you need to create a dataset. Click the Add a dataset button:

Select the appropriate Database, Schema, and Table, then click the Create dataset and create chart button in the bottom-right corner:

From here, you can create an example Bar Chart:

The dashboard below visualizes the number of anonymous users, segmented by whether the user is new:

Pie charts can also be used to present data as percentages of the entire analyzed dataset:

You can filter data to match specific conditions. For example, in the chart below, all records where 'countryName' is 'NULL' were excluded:

Line charts are effective for time-based analysis. In the example below, the data is segmented by the 'isRobot' dimension:

Creating Dashboard in Superset

Dashboards in Apache Superset are a collection of interactive charts that allow users to explore and analyze data in a single view.

Steps to Create a Dashboard:

1. Click Dashboards in the main menu, then select the + Dashboard button on the right:

2. Choose the charts you’ve previously created, and drag them from the right-hand panel to the center of the screen:

3. You can easily adjust each chart’s size and position within the dashboard using your mouse:

4. Name your dashboard and save the changes:

Now your dashboard is ready to support smarter, data-driven business decisions powered by real-time insights from Apache Druid!

Summary

Connecting Apache Druid with Apache Superset offers a simple yet powerful solution for real-time data exploration. Druid’s high-performance analytics engine, combined with Superset’s interactive visualization tools, enables technical teams to efficiently analyze large-scale datasets. This integration empowers organizations to build customized analytics solutions internally, enhancing data visibility and supporting more informed, strategic decision-making.

Subscribe and stay in the loop with the latest on Druid, Flink, and more!

Thank you for joining our newsletter!
Oops! Something went wrong while submitting the form.
Deep.BI needs the contact information you provide to contact you. You may unsubscribe at any time. For information on how to unsubscribe and more, please review our Privacy Policy.

You Might Also Like