Apache Druid is a powerful, high-performance real-time analytics database that can store valuable company data. However, the lack of built-in tools for data visualization makes it challenging for business users to extract actionable insights directly from the raw data.
Fortunately, Apache Superset – an open-source data visualization and exploration platform – seamlessly integrates with SQL-based databases, including Apache Druid. By combining Druid's fast, scalable data engine with Superset's intuitive visualization capabilities, data and business analysts can more easily uncover insights and drive value from the data.
Below is a simple step-by-step guide to help you connect your Apache Druid instance with Apache Superset and build a sample dashboard. We hope this tutorial proves helpful and empowers your analytics team to create insightful dashboards – ultimately enabling smarter, data-driven business decisions.
1. Create a dedicated folder for a demo:
mkdir demo
cd demo
2. Clone the Superset repository:
git clone --depth=1 https://github.com/apache/superset.git
3. Go into the main dictionary:
cd superset
4. Add required Apache Druid drivers:
echo "pydruid" >> ./docker/requirements-local.txt
5. Run the Superset application:
docker-compose build --force-rm
docker-compose up
6. Check the status of the containers:
docker ps
1. Download the latest Druid version from the official website. For this tutorial we are using Apache Druid 32.0.0 release:
wget https://archive.apache.org/dist/druid/32.0.0/apache-druid-32.0.0-bin.tar.gz
2. Extract the binaries:
tar -xzf apache-druid-32.0.0-bin.tar.gz
3. Configure Ports (If Needed)
By default, Apache Druid uses some ports that may conflict with Apache Superset, particularly port 8081
, which Superset may use for proxying. If you encounter any issues, follow the steps below to adjust the configuration:
A. Change the Coordinator Port (Default: 8081)
Open the following file:
/apache-druid-32.0.0/conf/druid/auto/coordinator-overlord/runtime.properties
Modify line 21 as follows:
druid.plaintextPort=8073
B. Change Jetty AdminServer Port (Default: 8080)
Apache ZooKeeper’s embedded AdminServer binds to port 8080
by default. If it conflicts with other services or causes errors, change the port in:
/apache-druid-32.0.0/conf/zk/zoo.cfg
Rewrite the default port to a different one by adding the below line into the configuration, for example:
admin.serverPort=8077
C. Skip Port Checks (Optional)
To bypass port availability checks when using custom ports, set the following environment variable before starting Druid:
export DRUID_SKIP_PORT_CHECK=1
4. Start Apache Druid Locally
Navigate to the extracted directory and start Druid:
cd apache-druid-32.0.0
./bin/start-druid
Then open your browser and go to:
http://localhost:8888/
This will open the Apache Druid console. If you see the UI, Druid is running successfully:
5. Load a Sample Dataset
To ingest a sample dataset (such as the Wikipedia example) for use with Apache Superset, follow the instructions in the Apache Druid Quickstart (local) tutorial
1. Go to http://localhost:8088/ in your browser:
2. Use default credentials to log in:
login: admin
password: admin
3. Set up the connection between Apache Superset and Apache Druid:
4. Choose Apache Druid database:
5. Provide a valid SQLAlchemy URI to successfully establish the connection
The general connection URL for open-source Apache Druid is:
druid://<broker_ip_address>:8082/druid/v2/sql
In our case, since Apache Superset is running in Docker and Apache Druid is running on the local host, you need to use 'host.docker.internal' as the hostname:
SQLAlchemi URI: druid://host.docker.internal:8082/druid/v2/sql
In the bottom right corner, you should see the following message that the connection is set up correctly:
6. You can always click the settings icon in the top-right corner of the screen to view your current database connections:
1. Go to SQL Lab to query Apache Druid from Apache Superset. You can access it either from the menu where the database connection was made or by clicking the SQL card at the top to go directly to SQL Lab:
2. Choose the Apache Druid database to query:
3. Run example queries as follows:
SELECT *
FROM wikipedia;
As you may notice, the default record limit is set to 1,000. You can easily adjust this limit using the input field next to the Run button.
Superset also displays the query execution time, allows you to download results as a CSV file, and provides a query history so you can revisit previous queries.
SELECT channel, COUNT(*) AS "Rows per channel"
FROM wikipedia
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10;
SQL Lab also allows you to create a Chart directly from that view. These charts can then be used while building the Dashboard.
Note: To create the chart, the SQL query must not include any aliases. Use the query below if you want to create a chart directly from this panel:
SELECT channel, COUNT(*)
FROM wikipedia
GROUP BY 1
ORDER BY 2 DESC;
You can create a chart from an Apache Druid dataset in two different ways:
A. From SQL Lab (Query-Based Chart)
Click the Create Chart button available in SQL Lab beneath the executed query:
You should see a chart creation window like the one below:
When you create a chart from a selected query in the SQL Lab section, the chart's data source is set to that specific query.
This means:
B. From the Charts Menu (Dataset-Based Chart)
Go to Charts from the main menu and click the + Chart button on the right:
Before creating your first chart this way, you need to create a dataset. Click the Add a dataset button:
Select the appropriate Database, Schema, and Table, then click the Create dataset and create chart button in the bottom-right corner:
From here, you can create an example Bar Chart:
The dashboard below visualizes the number of anonymous users, segmented by whether the user is new:
Pie charts can also be used to present data as percentages of the entire analyzed dataset:
You can filter data to match specific conditions. For example, in the chart below, all records where 'countryName'
is 'NULL'
were excluded:
Line charts are effective for time-based analysis. In the example below, the data is segmented by the 'isRobot'
dimension:
Dashboards in Apache Superset are a collection of interactive charts that allow users to explore and analyze data in a single view.
1. Click Dashboards in the main menu, then select the + Dashboard button on the right:
2. Choose the charts you’ve previously created, and drag them from the right-hand panel to the center of the screen:
3. You can easily adjust each chart’s size and position within the dashboard using your mouse:
4. Name your dashboard and save the changes:
Now your dashboard is ready to support smarter, data-driven business decisions powered by real-time insights from Apache Druid!
Connecting Apache Druid with Apache Superset offers a simple yet powerful solution for real-time data exploration. Druid’s high-performance analytics engine, combined with Superset’s interactive visualization tools, enables technical teams to efficiently analyze large-scale datasets. This integration empowers organizations to build customized analytics solutions internally, enhancing data visibility and supporting more informed, strategic decision-making.