Configuration reference
This page documents all of the configuration properties for each Druid service type.
Recommended configuration file organization
A recommended way of organizing Druid configuration files can be seen in the conf
directory in the Druid package root, shown below:
$ ls -R conf
druid
conf/druid:
_common broker coordinator historical middleManager overlord
conf/druid/_common:
common.runtime.properties log4j2.xml
conf/druid/broker:
jvm.config runtime.properties
conf/druid/coordinator:
jvm.config runtime.properties
conf/druid/historical:
jvm.config runtime.properties
conf/druid/middleManager:
jvm.config runtime.properties
conf/druid/overlord:
jvm.config runtime.properties
Each directory has a runtime.properties
file containing configuration properties for the specific Druid service corresponding to the directory, such as historical
.
The jvm.config
files contain JVM flags such as heap sizing properties for each service.
Common properties shared by all services are placed in _common/common.runtime.properties
.
Configuration interpolation
Configuration values can be interpolated from System Properties, Environment Variables, or local files. Below is an example of how this can be used:
druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE}
druid.processing.tmpDir=${sys:java.io.tmpdir}
druid.segmentCache.locations=${file:UTF-8:/config/segment-cache-def.json}
Interpolation is also recursive so you can do:
druid.segmentCache.locations=${file:UTF-8:${env:SEGMENT_DEF_LOCATION}}
If the property is not set, an exception will be thrown on startup, but a default can be provided if desired. Setting a default value will not work with file interpolation as an exception will be thrown if the file does not exist.
druid.metadata.storage.type=${env:METADATA_STORAGE_TYPE:-mysql}
druid.processing.tmpDir=${sys:java.io.tmpdir:-/tmp}
If you need to set a variable that is wrapped by ${...}
but do not want it to be interpolated, you can escape it by adding another $
. For example:
config.name=$${value}
Common configurations
The properties under this section are common configurations that should be shared across all Druid services in a cluster.
JVM configuration best practices
There are four JVM parameters that we set on all of our services:
-
-Duser.timezone=UTC
: This sets the default timezone of the JVM to UTC. We always set this and do not test with other default timezones, so local timezones might work, but they also might uncover weird and interesting bugs. To issue queries in a non-UTC timezone, see query granularities -
-Dfile.encoding=UTF-8
This is similar to timezone, we test assuming UTF-8. Local encodings might work, but they also might result in weird and interesting bugs. -
-Djava.io.tmpdir=<a path>
Various parts of Druid use temporary files to interact with the file system. These files can become quite large. This means that systems that have small/tmp
directories can cause problems for Druid. Therefore, set the JVM tmp directory to a location with ample space.Also consider the following when configuring the JVM tmp directory:
- The temp directory should not be volatile tmpfs.
- This directory should also have good read and write speed.
- Avoid NFS mount.
- The
org.apache.druid.java.util.metrics.SysMonitor
requires execute privileges on files injava.io.tmpdir
. If you are using the system monitor, do not setjava.io.tmpdir
tonoexec
.
-
-Djava.util.logging.manager=org.apache.logging.log4j.jul.LogManager
This allows log4j2 to handle logs for non-log4j2 components (like jetty) which use standard java logging.
Extensions
Many of Druid's external dependencies can be plugged in as modules. Extensions can be provided using the following configs:
Property | Description | Default |
---|---|---|
druid.extensions.directory | The root extension directory where user can put extensions related files. Druid will load extensions stored under this directory. | extensions (This is a relative path to Druid's working directory) |
druid.extensions.hadoopDependenciesDir | The root Hadoop dependencies directory where user can put Hadoop related dependencies files. Druid will load the dependencies based on the Hadoop coordinate specified in the Hadoop index task. | hadoop-dependencies (This is a relative path to Druid's working directory |
druid.extensions.loadList | A JSON array of extensions to load from extension directories by Druid. If it is not specified, its value will be null and Druid will load all the extensions under druid.extensions.directory . If its value is empty list [] , then no extensions will be loaded at all. It is also allowed to specify absolute path of other custom extensions not stored in the common extensions directory. | null |
druid.extensions.searchCurrentClassloader | This is a boolean flag that determines if Druid will search the main classloader for extensions. It defaults to true but can be turned off if you have reason to not automatically add all modules on the classpath. | true |
druid.extensions.useExtensionClassloaderFirst | This is a boolean flag that determines if Druid extensions should prefer loading classes from their own jars rather than jars bundled with Druid. If false, extensions must be compatible with classes provided by any jars bundled with Druid. If true, extensions may depend on conflicting versions. | false |
druid.extensions.hadoopContainerDruidClasspath | Hadoop Indexing launches Hadoop jobs and this configuration provides way to explicitly set the user classpath for the Hadoop job. By default, this is computed automatically by Druid based on the Druid service classpath and set of extensions. However, sometimes you might want to be explicit to resolve dependency conflicts between Druid and Hadoop. | null |
druid.extensions.addExtensionsToHadoopContainer | Only applicable if druid.extensions.hadoopContainerDruidClasspath is provided. If set to true, then extensions specified in the loadList are added to Hadoop container classpath. Note that when druid.extensions.hadoopContainerDruidClasspath is not provided then extensions are always added to Hadoop container classpath. | false |
Modules
Property | Description | Default |
---|---|---|
druid.modules.excludeList | A JSON array of canonical class names (e.g., "org.apache.druid.somepackage.SomeModule" ) of module classes which shouldn't be loaded, even if they are found in extensions specified by druid.extensions.loadList , or in the list of core modules specified to be loaded on a particular Druid service type. Useful when some useful extension contains some module, which shouldn't be loaded on some Druid service type because some dependencies of that module couldn't be satisfied. | [] |
ZooKeeper
We recommend just setting the base ZK path and the ZK service host, but all ZK paths that Druid uses can be overwritten to absolute paths.
Property | Description | Default |
---|---|---|
druid.zk.paths.base | Base ZooKeeper path. | /druid |
druid.zk.service.host | The ZooKeeper hosts to connect to. This is a REQUIRED property and therefore a host address must be supplied. | none |
druid.zk.service.user | The username to authenticate with ZooKeeper. This is an optional property. | none |
druid.zk.service.pwd | The Password Provider or the string password to authenticate with ZooKeeper. This is an optional property. | none |
druid.zk.service.authScheme | digest is the only authentication scheme supported. | digest |
ZooKeeper behavior
Property | Description | Default |
---|---|---|
druid.zk.service.sessionTimeoutMs | ZooKeeper session timeout, in milliseconds. | 30000 |
druid.zk.service.connectionTimeoutMs | ZooKeeper connection timeout, in milliseconds. | 15000 |
druid.zk.service.compress | Boolean flag for whether or not created Znodes should be compressed. | true |
druid.zk.service.acl | Boolean flag for whether or not to enable ACL security for ZooKeeper. If ACL is enabled, zNode creators will have all permissions. | false |
druid.zk.service.pathChildrenCacheStrategy | Dictates the underlying caching strategy for service announcements. Set true to let announcers to use Apache Curator's PathChildrenCache strategy, otherwise NodeCache strategy. Consider using NodeCache strategy when you are dealing with huge number of ZooKeeper watches in your cluster. | true |
Path configuration
Druid interacts with ZooKeeper through a set of standard path configurations. We recommend just setting the base ZooKeeper path, but all ZooKeeper paths that Druid uses can be overwritten to absolute paths.
Property | Description | Default |
---|---|---|
druid.zk.paths.base | Base ZooKeeper path. | /druid |
druid.zk.paths.propertiesPath | ZooKeeper properties path. | ${druid.zk.paths.base}/properties |
druid.zk.paths.announcementsPath | Druid service announcement path. | ${druid.zk.paths.base}/announcements |
druid.zk.paths.liveSegmentsPath | Current path for where Druid services announce their segments. | ${druid.zk.paths.base}/segments |
druid.zk.paths.coordinatorPath | Used by the Coordinator for leader election. | ${druid.zk.paths.base}/coordinator |
The indexing service also uses its own set of paths. These configs can be included in the common configuration.
Property | Description | Default |
---|---|---|
druid.zk.paths.indexer.base | Base ZooKeeper path for | ${druid.zk.paths.base}/indexer |
druid.zk.paths.indexer.announcementsPath | Middle Managers announce themselves here. | ${druid.zk.paths.indexer.base}/announcements |
druid.zk.paths.indexer.tasksPath | Used to assign tasks to Middle Managers. | ${druid.zk.paths.indexer.base}/tasks |
druid.zk.paths.indexer.statusPath | Parent path for announcement of task statuses. | ${druid.zk.paths.indexer.base}/status |
If druid.zk.paths.base
and druid.zk.paths.indexer.base
are both set, and none of the other druid.zk.paths.*
or druid.zk.paths.indexer.*
values are set, then the other properties will be evaluated relative to their respective base
.
For example, if druid.zk.paths.base
is set to /druid1
and druid.zk.paths.indexer.base
is set to /druid2
then druid.zk.paths.announcementsPath
will default to /druid1/announcements
while druid.zk.paths.indexer.announcementsPath
will default to /druid2/announcements
.
The following path is used for service discovery. It is not affected by druid.zk.paths.base
and must be specified separately.
Property | Description | Default |
---|---|---|
druid.discovery.curator.path | Services announce themselves under this ZooKeeper path. | /druid/discovery |
TLS
General configuration
Property | Description | Default |
---|---|---|
druid.enablePlaintextPort | Enable/Disable HTTP connector. | true |
druid.enableTlsPort | Enable/Disable HTTPS connector. | false |
Although not recommended but both HTTP and HTTPS connectors can be enabled at a time and respective ports are configurable using druid.plaintextPort
and druid.tlsPort
properties on each service. Please see Configuration
section of individual services to check the valid and default values for these ports.
Jetty server TLS configuration
Druid uses Jetty as an embedded web server. To learn more about TLS/SSL, certificates, and related concepts in Jetty, including explanations of the configuration settings below, see "Configuring SSL/TLS KeyStores" in the Jetty Operations Guide.
For information about TLS/SSL support in Java in general, see the Java Secure Socket Extension (JSSE) Reference Guide. The Java Cryptography Architecture Standard Algorithm Name Documentation for JDK 11 lists all possible values for the following properties, among others provided by the Java implementation.
Property | Description | Default | Required |
---|---|---|---|
druid.server.https.keyStorePath | The file path or URL of the TLS/SSL KeyStore. | none | yes |
druid.server.https.keyStoreType | The type of the KeyStore. | none | yes |
druid.server.https.certAlias | Alias of TLS/SSL certificate for the connector. | none | yes |
druid.server.https.keyStorePassword | The Password Provider or String password for the KeyStore. | none | yes |
Following table contains non-mandatory advanced configuration options, use caution.
Property | Description | Default | Required |
---|---|---|---|
druid.server.https.keyManagerFactoryAlgorithm | Algorithm to use for creating KeyManager, more details here. | javax.net.ssl.KeyManagerFactory.getDefaultAlgorithm() | no |
druid.server.https.keyManagerPassword | The Password Provider or String password for the Key Manager. | none | no |
druid.server.https.includeCipherSuites | List of cipher suite names to include. You can either use the exact cipher suite name or a regular expression. | Jetty's default include cipher list | no |
druid.server.https.excludeCipherSuites | List of cipher suite names to exclude. You can either use the exact cipher suite name or a regular expression. | Jetty's default exclude cipher list | no |
druid.server.https.includeProtocols | List of exact protocols names to include. | Jetty's default include protocol list | no |
druid.server.https.excludeProtocols | List of exact protocols names to exclude. | Jetty's default exclude protocol list | no |
Internal client TLS configuration (requires simple-client-sslcontext
extension)
These properties apply to the SSLContext that will be provided to the internal HTTP client that Druid services use to communicate with each other. These properties require the simple-client-sslcontext
extension to be loaded. Without it, Druid services will be unable to communicate with each other when TLS is enabled.
Property | Description | Default | Required |
---|---|---|---|
druid.client.https.protocol | SSL protocol to use. | TLSv1.2 | no |
druid.client.https.trustStoreType | The type of the key store where trusted root certificates are stored. | java.security.KeyStore.getDefaultType() | no |
druid.client.https.trustStorePath | The file path or URL of the TLS/SSL Key store where trusted root certificates are stored. | none | yes |
druid.client.https.trustStoreAlgorithm | Algorithm to be used by TrustManager to validate certificate chains | javax.net.ssl.TrustManagerFactory.getDefaultAlgorithm() | no |
druid.client.https.trustStorePassword | The Password Provider or String password for the Trust Store. | none | yes |
This document lists all the possible values for the above mentioned configs among others provided by Java implementation.
Authentication and authorization
Property | Type | Description | Default | Required |
---|---|---|---|---|
druid.auth.authenticatorChain | JSON List of Strings | List of Authenticator type names | ["allowAll"] | no |
druid.escalator.type | String | Type of the Escalator that should be used for internal Druid communications. This Escalator must use an authentication scheme that is supported by an Authenticator in druid.auth.authenticatorChain . | noop | no |
druid.auth.authorizers | JSON List of Strings | List of Authorizer type names | ["allowAll"] | no |
druid.auth.unsecuredPaths | List of Strings | List of paths for which security checks will not be performed. All requests to these paths will be allowed. | [] | no |
druid.auth.allowUnauthenticatedHttpOptions | Boolean | If true, skip authentication checks for HTTP OPTIONS requests. This is needed for certain use cases, such as supporting CORS pre-flight requests. Note that disabling authentication checks for OPTIONS requests will allow unauthenticated users to determine what Druid endpoints are valid (by checking if the OPTIONS request returns a 200 instead of 404), so enabling this option may reveal information about server configuration, including information about what extensions are loaded (if those extensions add endpoints). | false | no |
For more information, please see Authentication and Authorization.
For configuration options for specific auth extensions, please refer to the extension documentation.
Startup logging
All services can log debugging information on startup.
Property | Description | Default |
---|---|---|
druid.startup.logging.logProperties | Log all properties on startup (from common.runtime.properties, runtime.properties, and the JVM command line). | false |
druid.startup.logging.maskProperties | Masks sensitive properties (passwords, for example) containing theses words. | ["password"] |
Note that some sensitive information may be logged if these settings are enabled.
Request logging
All services that can serve queries can also log the query requests they see. Broker services can additionally log the SQL requests (both from HTTP and JDBC) they see. For an example of setting up request logging, see Request logging.
Property | Description | Default |
---|---|---|
druid.request.logging.type | How to log every query request. Choices: noop , file , emitter , slf4j , filtered , composing , switching | noop (request logging disabled by default) |
To enable sending all the HTTP requests to a log, set org.apache.druid.jetty.RequestLog
to the DEBUG
level. See Logging for more information.
File request logging
The file
request logger stores daily request logs on disk.
Property | Description | Default |
---|---|---|
druid.request.logging.dir | Historical, Realtime, and Broker services maintain request logs of all of the requests they get (interaction is via POST, so normal request logs don’t generally capture information about the actual query), this specifies the directory to store the request logs in. | none |
druid.request.logging.filePattern | Joda datetime format for each file. | "yyyy-MM-dd'.log'" |
druid.request.logging.durationToRetain | Period to retain the request logs on disk. The period should be at least as long as roll period. | none |
druid.request.logging.rollPeriod | Defines the log rotation period for request logs. The period should be at least PT1H . For periods smaller than 1 day, it is recommended to use "yyyy-MM-dd-HH'.log'" as the file pattern. | P1D |
The format of request logs is TSV, one line per requests, with five fields: timestamp, remote_addr, native_query, query_context, sql_query.
For native JSON request, the sql_query
field is empty. For example:
2019-01-14T10:00:00.000Z 127.0.0.1 {"queryType":"topN","dataSource":{"type":"table","name":"wikiticker"},"virtualColumns":[],"dimension":{"type":"LegacyDimensionSpec","dimension":"page","outputName":"page","outputType":"STRING"},"metric":{"type":"LegacyTopNMetricSpec","metric":"count"},"threshold":10,"intervals":{"type":"LegacySegmentSpec","intervals":["2015-09-12T00:00:00.000Z/2015-09-13T00:00:00.000Z"]},"filter":null,"granularity":{"type":"all"},"aggregations":[{"type":"count","name":"count"}],"postAggregations":[],"context":{"queryId":"74c2d540-d700-4ebd-b4a9-3d02397976aa"},"descending":false} {"query/time":100,"query/bytes":800,"success":true,"identity":"user1"}
For SQL query request, the native_query
field is empty. For example:
2019-01-14T10:00:00.000Z 127.0.0.1 {"sqlQuery/time":100, "sqlQuery/planningTimeMs":10, "sqlQuery/bytes":600, "success":true, "identity":"user1"} {"query":"SELECT page, COUNT(*) AS Edits FROM wikiticker WHERE TIME_IN_INTERVAL(\"__time\", '2015-09-12/2015-09-13') GROUP BY page ORDER BY Edits DESC LIMIT 10","context":{"sqlQueryId":"c9d035a0-5ffd-4a79-a865-3ffdadbb5fdd","nativeQueryIds":"[490978e4-f5c7-4cf6-b174-346e63cf8863]"}}
Emitter request logging
The emitter
request logger emits every request to the external location specified in the emitter configuration.
Property | Description | Default |
---|---|---|
druid.request.logging.feed | Feed name for requests. | none |
SLF4J request logging
The slf4j
request logger logs every request using SLF4J. It serializes native queries into JSON in the log message regardless of the SLF4J format specification. Requests are logged under the class org.apache.druid.server.log.LoggingRequestLogger
.
Property | Description | Default |
---|---|---|
druid.request.logging.setMDC | If you want to set MDC entries within the log entry, set this value to true . Your logging system must be configured to support MDC in order to format this data. | false |
druid.request.logging.setContextMDC | Set to "true" to add the Druid query context to the MDC entries. Only applies when setMDC is true . | false |
For a native query, the following MDC fields are populated when setMDC
is true
:
MDC field | Description |
---|---|
queryId | The query ID |
sqlQueryId | The SQL query ID if this query is part of a SQL request |
dataSource | The datasource the query was against |
queryType | The type of the query |
hasFilters | If the query has any filters |
remoteAddr | The remote address of the requesting client |
duration | The duration of the query interval |
resultOrdering | The ordering of results |
descending | If the query is a descending query |
Filtered request logging
The filtered
request logger filters requests based on the query type or how long a query takes to complete.
For native queries, the logger only logs requests when the query/time
metric exceeds the threshold provided in queryTimeThresholdMs
.
For SQL queries, it only logs requests when the sqlQuery/time
metric exceeds threshold provided in sqlQueryTimeThresholdMs
.
See Metrics for more details on query metrics.
Requests that meet the threshold are logged using the request logger type set in druid.request.logging.delegate.type
.
Property | Description | Default |
---|---|---|
druid.request.logging.queryTimeThresholdMs | Threshold value for the query/time metric in milliseconds. | 0, i.e., no filtering |
druid.request.logging.sqlQueryTimeThresholdMs | Threshold value for the sqlQuery/time metric in milliseconds. | 0, i.e., no filtering |
druid.request.logging.mutedQueryTypes | Query requests of these types are not logged. Query types are defined as string objects corresponding to the "queryType" value for the specified query in the Druid's native JSON query API. Misspelled query types will be ignored. Example to ignore scan and timeBoundary queries: ["scan", "timeBoundary"] | [] |
druid.request.logging.delegate.type | Type of delegate request logger to log requests. | none |
Composing request logging
The composing
request logger emits request logs to multiple request loggers.
Property | Description | Default |
---|---|---|
druid.request.logging.loggerProviders | List of request loggers for emitting request logs. | none |
Switching request logging
The switching
request logger routes native query request logs to one request logger and SQL query request logs to another request logger.
Property | Description | Default |
---|---|---|
druid.request.logging.nativeQueryLogger | Request logger for emitting native query request logs. | none |
druid.request.logging.sqlQueryLogger | Request logger for emitting SQL query request logs. | none |
Audit logging
Coordinator and Overlord log changes to lookups, segment load/drop rules, and dynamic configuration changes for auditing.
Property | Description | Default |
---|---|---|
druid.audit.manager.type | Type of audit manager used for handling audited events. Audited events are logged when set to log or persisted in metadata store when set to sql . | sql |
druid.audit.manager.logLevel | Log level of audit events with possible values DEBUG, INFO, WARN. This property is used only when druid.audit.manager.type is set to log . | INFO |
druid.audit.manager.auditHistoryMillis | Default duration for querying audit history. | 1 week |
druid.audit.manager.includePayloadAsDimensionInMetric | Boolean flag on whether to add payload column in service metric. | false |
druid.audit.manager.maxPayloadSizeBytes | The maximum size of audit payload to store in Druid's metadata store audit table. If the size of audit payload exceeds this value, the audit log would be stored with a message indicating that the payload was omitted instead. Setting maxPayloadSizeBytes to -1 (default value) disables this check, meaning Druid will always store audit payload regardless of it's size. Setting to any negative number other than -1 is invalid. Human-readable format is supported, see here. | -1 |
druid.audit.manager.skipNullField | If true, the audit payload stored in metadata store will exclude any field with null value. | false |
Metadata storage
These properties specify the JDBC connection and other configuration around the metadata storage. The only services that connect to the metadata storage with these properties are the Coordinator and Overlord.
Property | Description | Default |
---|---|---|
druid.metadata.storage.type | The type of metadata storage to use. One of mysql , postgresql , or derby . | derby |
druid.metadata.storage.connector.connectURI | The JDBC URI for the database to connect to | none |
druid.metadata.storage.connector.user | The username to connect with. | none |
druid.metadata.storage.connector.password | The Password Provider or String password used to connect with. | none |
druid.metadata.storage.connector.createTables | If Druid requires a table and it doesn't exist, create it? | true |
druid.metadata.storage.tables.base | The base name for tables. | druid |
druid.metadata.storage.tables.dataSource | The table to use to look for datasources created by Kafka Indexing Service. | druid_dataSource |
druid.metadata.storage.tables.pendingSegments | The table to use to look for pending segments. | druid_pendingSegments |
druid.metadata.storage.tables.segments | The table to use to look for segments. | druid_segments |
druid.metadata.storage.tables.rules | The table to use to look for segment load/drop rules. | druid_rules |
druid.metadata.storage.tables.config | The table to use to look for configs. | druid_config |
druid.metadata.storage.tables.tasks | Used by the indexing service to store tasks. | druid_tasks |
druid.metadata.storage.tables.taskLog | Used by the indexing service to store task logs. | druid_tasklogs |
druid.metadata.storage.tables.taskLock | Used by the indexing service to store task locks. | druid_tasklocks |
druid.metadata.storage.tables.supervisors | Used by the indexing service to store supervisor configurations. | druid_supervisors |
druid.metadata.storage.tables.audit | The table to use for audit history of configuration changes, such as Coordinator rules. | druid_audit |
Deep storage
The configurations concern how to push and pull Segments from deep storage.
Property | Description | Default |
---|---|---|
druid.storage.type | The type of deep storage to use. One of local , noop , s3 , hdfs , c* . | local |
Local deep storage
Local deep storage uses the local filesystem.
Property | Description | Default |
---|---|---|
druid.storage.storageDirectory | Directory on disk to use as deep storage. | /tmp/druid/localStorage |
Noop deep storage
This deep storage doesn't do anything. There are no configs.
S3 deep storage
This deep storage is used to interface with Amazon's S3. Note that the druid-s3-extensions
extension must be loaded.
The below table shows some important configurations for S3. See S3 Deep Storage for full configurations.
Property | Description | Default |
---|---|---|
druid.storage.bucket | S3 bucket name. | none |
druid.storage.baseKey | S3 object key prefix for storage. | none |
druid.storage.disableAcl | Boolean flag for ACL. If this is set to false , the full control would be granted to the bucket owner. This may require to set additional permissions. See S3 permissions settings. | false |
druid.storage.archiveBucket | S3 bucket name for archiving when running the archive task. | none |
druid.storage.archiveBaseKey | S3 object key prefix for archiving. | none |
druid.storage.sse.type | Server-side encryption type. Should be one of s3 , kms , and custom . See the below Server-side encryption section for more details. | None |
druid.storage.sse.kms.keyId | AWS KMS key ID. This is used only when druid.storage.sse.type is kms and can be empty to use the default key ID. | None |
druid.storage.sse.custom.base64EncodedKey | Base64-encoded key. Should be specified if druid.storage.sse.type is custom . | None |
druid.storage.useS3aSchema | If true, use the "s3a" filesystem when using Hadoop-based ingestion. If false, the "s3n" filesystem will be used. Only affects Hadoop-based ingestion. | false |
HDFS deep storage
This deep storage is used to interface with HDFS. You must load the druid-hdfs-storage
extension.
Property | Description | Default |
---|---|---|
druid.storage.storageDirectory | HDFS directory to use as deep storage. | none |
Cassandra deep storage
This deep storage is used to interface with Cassandra. You must load the druid-cassandra-storage
extension.
Property | Description | Default |
---|---|---|
druid.storage.host | Cassandra host. | none |
druid.storage.keyspace | Cassandra key space. | none |
Centralized datasource schema (Experimental)
This is an experimental feature to improve datasource schema management by persisting segment schemas to the metadata store and caching them on the Coordinator. Traditionally, Brokers issue segment metadata queries to data nodes and tasks to fetch the schemas of all available segments. Each Broker then individually builds the schema of a datasource by combining the schemas of all the segments of that datasource. This mechanism is redundant and prone to errors as there is no single source of truth for schemas.
Centralized schema management improves upon this design as follows:
- Tasks publish segment schema along with segment metadata to the database.
- Tasks announce schema for realtime segments periodically to the Coordinator.
- Coordinator caches segment schemas and builds a combined schema for each datasource.
- Broker poll the datasource schema cached on the Coordinator rather than building it on their own.
- Brokers still retain the ability to build a datasource schema if they are unable to fetch it from the Coordinator.
Property | Description | Default | Required |
---|---|---|---|
druid.centralizedDatasourceSchema.enabled | Boolean flag for enabling datasource schema building and caching on the Coordinator. This property should be specified in the common runtime properties. | false | No. |
druid.indexer.fork.property.druid.centralizedDatasourceSchema.enabled | This config should be set when CentralizedDatasourceSchema feature is enabled. This should be specified in the Middle Manager runtime properties. | false | No. |
If you enable this feature, you can query datasources that are only stored in deep storage and are not loaded on a Historical. For more information, see Query from deep storage.
For stale schema cleanup configs, refer to properties with the prefix druid.coordinator.kill.segmentSchema
in Metadata Management.
Ingestion security configuration
HDFS input source
You can set the following property to specify permissible protocols for the HDFS input source.
Property | Possible values | Description | Default |
---|---|---|---|
druid.ingestion.hdfs.allowedProtocols | List of protocols | Allowed protocols for the HDFS input source. | ["hdfs"] |
HTTP input source
You can set the following property to specify permissible protocols for the HTTP input source.
Property | Possible values | Description | Default |
---|---|---|---|
druid.ingestion.http.allowedProtocols | List of protocols | Allowed protocols for the HTTP input source. | ["http", "https"] |
druid.ingestion.http.allowedHeaders | A list of permitted request headers for the HTTP input source. By default, the list is empty, which means no headers are allowed in the ingestion specification. | [] |
External data access security configuration
JDBC connections to external databases
You can use the following properties to specify permissible JDBC options for:
These properties do not apply to metadata storage connections.
Property | Possible values | Description | Default |
---|---|---|---|
druid.access.jdbc.enforceAllowedProperties | Boolean | When true, Druid applies druid.access.jdbc.allowedProperties to JDBC connections starting with jdbc:postgresql: , jdbc:mysql: , or jdbc:mariadb: . When false, Druid allows any kind of JDBC connections without JDBC property validation. This config is for backward compatibility especially during upgrades since enforcing allow list can break existing ingestion jobs or lookups based on JDBC. This config is deprecated and will be removed in a future release. | true |
druid.access.jdbc.allowedProperties | List of JDBC properties | Defines a list of allowed JDBC properties. Druid always enforces the list for all JDBC connections starting with jdbc:postgresql: , jdbc:mysql: , and jdbc:mariadb: if druid.access.jdbc.enforceAllowedProperties is set to true.This option is tested against MySQL connector 8.2.0, MariaDB connector 2.7.4, and PostgreSQL connector 42.2.14. Other connector versions might not work. | ["useSSL", "requireSSL", "ssl", "sslmode"] |
druid.access.jdbc.allowUnknownJdbcUrlFormat | Boolean | When false, Druid only accepts JDBC connections starting with jdbc:postgresql: or jdbc:mysql: . When true, Druid allows JDBC connections to any kind of database, but only enforces druid.access.jdbc.allowedProperties for PostgreSQL and MySQL/MariaDB. | true |
Task logging
You can use the druid.indexer
configuration to set a long-term storage location for task log files, and to set a retention policy.
For more information about ingestion tasks and the services of generating logs, see the task reference.
Log long-term storage
Property | Description | Default |
---|---|---|
druid.indexer.logs.type | Where to store task logs. noop , s3 , azure , google , hdfs , file | file |
File task logs
Store task logs in the local filesystem.
Property | Description | Default |
---|---|---|
druid.indexer.logs.directory | Local filesystem path. | log |
S3 task logs
Store task logs in S3. Note that the druid-s3-extensions
extension must be loaded.
Property | Description | Default |
---|---|---|
druid.indexer.logs.s3Bucket | S3 bucket name. | none |
druid.indexer.logs.s3Prefix | S3 key prefix. | none |
druid.indexer.logs.disableAcl | Boolean flag for ACL. If this is set to false , the full control would be granted to the bucket owner. If the task logs bucket is the same as the deep storage (S3) bucket, then the value of this property will need to be set to true if druid.storage.disableAcl has been set to true. | false |
Azure Blob Store task logs
Store task logs in Azure Blob Store. To enable this feature, load the druid-azure-extensions
extension, and configure deep storage for Azure. Druid uses the same authentication method configured for deep storage and stores task logs in the same storage account (set in druid.azure.account
).
Property | Description | Default |
---|---|---|
druid.indexer.logs.container | The Azure Blob Store container to write logs to. | Must be set. |
druid.indexer.logs.prefix | The path to prepend to logs. | Must be set. |
Google Cloud Storage task logs
Store task logs in Google Cloud Storage.
Note: The druid-google-extensions
extension must be loaded, and this uses the same storage settings as the deep storage module for google.
Property | Description | Default |
---|---|---|
druid.indexer.logs.bucket | The Google Cloud Storage bucket to write logs to | none |
druid.indexer.logs.prefix | The path to prepend to logs | none |
HDFS task logs
Store task logs in HDFS. Note that the druid-hdfs-storage
extension must be loaded.
Property | Description | Default |
---|---|---|
druid.indexer.logs.directory | The directory to store logs. | none |
Log retention policy
Property | Description | Default |
---|---|---|
druid.indexer.logs.kill.enabled | Boolean value for whether to enable deletion of old task logs. If set to true, Overlord will submit kill tasks periodically based on druid.indexer.logs.kill.delay specified, which will delete task logs from the log directory as well as tasks and tasklogs table entries in metadata storage except for tasks created in the last druid.indexer.logs.kill.durationToRetain period. | false |
druid.indexer.logs.kill.durationToRetain | Required if kill is enabled. In milliseconds, task logs and entries in task-related metadata storage tables to be retained created in last x milliseconds. | None |
druid.indexer.logs.kill.initialDelay | Optional. Number of milliseconds after Overlord start when first auto kill is run. | random value less than 300000 (5 mins) |
druid.indexer.logs.kill.delay | Optional. Number of milliseconds of delay between successive executions of auto kill run. | 21600000 (6 hours) |
API error response
You can configure Druid API error responses to hide internal information like the Druid class name, stack trace, thread name, servlet name, code, line/column number, host, or IP address.
Property | Description | Default |
---|---|---|
druid.server.http.showDetailedJettyErrors | When set to true, any error from the Jetty layer / Jetty filter includes the following fields in the JSON response: servlet , message , url , status , and cause , if it exists. When set to false, the JSON response only includes message , url , and status . The field values remain unchanged. | true |
druid.server.http.errorResponseTransform.strategy | Error response transform strategy. The strategy controls how Druid transforms error responses from Druid services. When unset or set to none , Druid leaves error responses unchanged. | none |
Error response transform strategy
You can use an error response transform strategy to transform error responses from within Druid services to hide internal information.
When you specify an error response transform strategy other than none
, Druid transforms the error responses from Druid services as follows:
- For any query API that fails in the Router service, Druid sets the fields
errorClass
andhost
to null. Druid applies the transformation strategy to theerrorMessage
field. - For any SQL query API that fails, for example
POST /druid/v2/sql/...
, Druid sets the fieldserrorClass
andhost
to null. Druid applies the transformation strategy to theerrorMessage
field. - For any JDBC related exceptions, Druid will turn all checked exceptions into
QueryInterruptedException
otherwise druid will attempt to keep the exception as the same type. For example if the original exception isn't owned by Druid it will becomeQueryInterruptedException
. Druid applies the transformation strategy to theerrorMessage
field.
No error response transform strategy
In this mode, Druid leaves error responses from underlying services unchanged and returns the unchanged errors to the API client.
This is the default Druid error response mode. To explicitly enable this strategy, set druid.server.http.errorResponseTransform.strategy
to none
.
Allowed regular expression error response transform strategy
In this mode, Druid validates the error responses from underlying services against a list of regular expressions. Only error messages that match a configured regular expression are returned. To enable this strategy, set druid.server.http.errorResponseTransform.strategy
to allowedRegex
.
Property | Description | Default |
---|---|---|
druid.server.http.errorResponseTransform.allowedRegex | The list of regular expressions Druid uses to validate error messages. If the error message matches any of the regular expressions, then Druid includes it in the response unchanged. If the error message does not match any of the regular expressions, Druid replaces the error message with null or with a default message depending on the type of underlying Exception. | [] |
For example, consider the following error response:
{"error":"Plan validation failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource' not found","errorClass":"org.apache.calcite.tools.ValidationException","host":null}
If druid.server.http.errorResponseTransform.allowedRegex
is set to []
, Druid transforms the query error response to the following:
{"error":"Plan validation failed","errorMessage":null,"errorClass":null,"host":null}
On the other hand, if druid.server.http.errorResponseTransform.allowedRegex
is set to [".*CalciteContextException.*"]
then Druid transforms the query error response to the following:
{"error":"Plan validation failed","errorMessage":"org.apache.calcite.runtime.CalciteContextException: From line 1, column 15 to line 1, column 38: Object 'nonexistent-datasource' not found","errorClass":null,"host":null}
Overlord discovery
This config is used to find the Overlord using Curator service discovery. Only required if you are actually running an Overlord.
Property | Description | Default |
---|---|---|
druid.selectors.indexing.serviceName | The druid.service name of the Overlord service. To start the Overlord with a different name, set it with this property. | druid/overlord |
Coordinator discovery
This config is used to find the Coordinator using Curator service discovery. This config is used by the realtime indexing services to get information about the segments loaded in the cluster.
Property | Description | Default |
---|---|---|
druid.selectors.coordinator.serviceName | The druid.service name of the Coordinator service. To start the Coordinator with a different name, set it with this property. | druid/coordinator |
Announcing segments
You can configure how to announce and unannounce Znodes in ZooKeeper (using Curator). For normal operations you do not need to override any of these configs.
Batch data segment announcer
In current Druid, multiple data segments may be announced under the same Znode.
Property | Description | Default |
---|---|---|
druid.announcer.segmentsPerNode | Each Znode contains info for up to this many segments. | 50 |
druid.announcer.maxBytesPerNode | Max byte size for Znode. Allowed range is [1024, 1048576]. | 524288 |
druid.announcer.skipDimensionsAndMetrics | Skip Dimensions and Metrics list from segment announcements. NOTE: Enabling this will also remove the dimensions and metrics list from Coordinator and Broker endpoints. | false |
druid.announcer.skipLoadSpec | Skip segment LoadSpec from segment announcements. NOTE: Enabling this will also remove the loadspec from Coordinator and Broker endpoints. | false |
If you want to turn off the batch data segment announcer, you can add a property to skip announcing segments. You do not want to enable this config if you have any services using batch
for druid.serverview.type
Property | Description | Default |
---|---|---|
druid.announcer.skipSegmentAnnouncementOnZk | Skip announcing segments to ZooKeeper. Note that the batch server view will not work if this is set to true. | false |
JavaScript
Druid supports dynamic runtime extension through JavaScript functions. This functionality can be configured through the following properties.
Property | Description | Default |
---|---|---|
druid.javascript.enabled | Set to "true" to enable JavaScript functionality. This affects the JavaScript parser, filter, extractionFn, aggregator, post-aggregator, router strategy, and worker selection strategy. | false |
JavaScript-based functionality is disabled by default. Please refer to the Druid JavaScript programming guide for guidelines about using Druid's JavaScript functionality, including instructions on how to enable it.
Double column storage
Prior to version 0.13.0, Druid's storage layer used a 32-bit float representation to store columns created by the
doubleSum, doubleMin, and doubleMax aggregators at indexing time.
Starting from version 0.13.0 the default will be 64-bit floats for Double columns.
Using 64-bit representation for double column will lead to avoid precision loss at the cost of doubling the storage size of such columns.
To keep the old format set the system-wide property druid.indexing.doubleStorage=float
.
You can also use floatSum
, floatMin
, and floatMax
to use 32-bit float representation.
Support for 64-bit floating point columns was released in Druid 0.11.0, so if you use this feature then older versions of Druid will not be able to read your data segments.
Property | Description | Default |
---|---|---|
druid.indexing.doubleStorage | Set to "float" to use 32-bit double representation for double columns. | double |
HTTP client
All Druid components can communicate with each other over HTTP.
Property | Description | Default |
---|---|---|
druid.global.http.numConnections | Size of connection pool per destination URL. If there are more HTTP requests than this number that all need to speak to the same URL, then they will queue up. | 20 |
druid.global.http.eagerInitialization | Indicates that http connections should be eagerly initialized. If set to true, numConnections connections are created upon initialization | false |
druid.global.http.compressionCodec | Compression codec to communicate with others. May be "gzip" or "identity". | gzip |
druid.global.http.readTimeout | The timeout for data reads. | PT15M |
druid.global.http.unusedConnectionTimeout | The timeout for idle connections in connection pool. The connection in the pool will be closed after this timeout and a new one will be established. This timeout should be less than druid.global.http.readTimeout . Set this timeout = ~90% of druid.global.http.readTimeout | PT4M |
druid.global.http.numMaxThreads | Maximum number of I/O worker threads | max(10, ((number of cores * 17) / 16 + 2) + 30) |
druid.global.http.clientConnectTimeout | The timeout (in milliseconds) for establishing client connections. | 500 |
Common endpoints configuration
This section contains the configuration options for endpoints that are supported by all services.
Property | Description | Default |
---|---|---|
druid.server.hiddenProperties | If property names or substring of property names (case insensitive) is in this list, responses of the /status/properties endpoint do not show these properties | ["druid.s3.accessKey","druid.s3.secretKey","druid.metadata.storage.connector.password", "password", "key", "token", "pwd"] |