Documentation

Apache Druid®

Dataverse for Apache Druid®

What is Dataverse for Apache Druid®?

Dataverse for Apache Druid®  is a real-time database to power modern analytics applications, deployable in the cloud of your choice, which can bring unlimited scalability and high-availability to your environment and other time series applications.

Why Apache Druid?

Build fast, modern data analytics applications

Druid is designed for workflows where fast ad-hoc analytics, instant data visibility, or supporting high concurrency is important. As such, Druid is often used to power UIs where an interactive, consistent user experience is desired.

Easy integration with your existing data pipelines

Druid streams data from message buses such as Kafka, and Amazon Kinesis, and batch load files from data lakes such as HDFS, and Amazon S3. Druid supports most popular file formats for structured and semi-structured data.

Fast, consistent queries at high concurrency

Druid has been benchmarked to greatly outperform legacy solutions. Druid combines novel storage ideas, indexing structures, and both exact and approximate queries to return most results in under a second.

Broad applicability

Druid unlocks new types of queries and workflows for clickstream, APM, supply chain, network telemetry, digital marketing, risk/fraud, and many other types of data. Druid is purpose built for rapid, ad-hoc queries on both real-time and historical data.

Deploy in public, private, and hybrid clouds

Druid can be deployed in any *NIX environment on commodity hardware, both in the cloud and on premise. Deploying Druid is easy: scaling up and down is as simple as adding and removing Druid services.

Integrates with other Dataverse building blocks

Apache Druid is highly compatible with other Dataverse blocks.

Apache Druid resources