How to set up credentials
dlt
automatically extracts configuration settings and secrets based on flexible naming conventions.
It then injects these values where needed in functions decorated with @dlt.source
, @dlt.resource
, or @dlt.destination
.
- Configuration refers to non-sensitive settings that define a data pipeline's behavior. These include file paths, database hosts, timeouts, API URLs, and performance settings.
- Secrets are sensitive data like passwords, API keys, and private keys. They should never be hard-coded to avoid security risks.
Available config providers
There are multiple ways to define configurations and credentials for your pipelines. dlt
looks for these definitions in the following order during pipeline execution:
Environment Variables: If a value for a specific argument is found in an environment variable, dlt will use it and will not proceed to search in lower-priority providers.
Vaults: Credentials specified in vaults like Google Secrets Manager, Azure Key Vault, AWS Secrets Manager.
secrets.toml and config.toml files: These files are used for storing both configuration values and secrets.
secrets.toml
is dedicated to sensitive information, whileconfig.toml
contains non-sensitive configuration data.Custom Providers added with
register_provider
: This is a custom provider implementation you can design yourself. A custom config provider is helpful if you want to use your own configuration file structure or perform advanced preprocessing of configs and secrets.Default Argument Values: These are the values specified in the function's signature.
Please make sure your pipeline name contains no whitespace or any other punctuation characters except "-"
and "_"
. This way you will ensure your code is working with any configuration option.
Naming convention
dlt
uses a specific naming hierarchy to search for the secrets and configs values. This makes configurations and secrets easy to manage.
To keep the naming convention flexible, dlt
looks for a lot of possible combinations of key names, starting from the most specific possible path. Then, if the value is not found, it removes the right-most section and tries again.
The most specific possible path for sources looks like:
- TOML config provider
- Environment variables
- In the code
[<pipeline_name>.sources.<source_module_name>.<source_function_name>]
<argument_name>="some_value"
export PIPELINE_NAME__SOURCES__SOURCE_MODULE_NAME__SOURCE_FUNCTION_NAME__ARGUMENT_NAME="some_value"
import os
os.environ["PIPELINE_NAME__SOURCES__SOURCE_MODULE_NAME__SOURCE_FUNCTION_NAME__ARGUMENT_NAME"] = "some_value"
The most specific possible path for destinations looks like:
- TOML config provider
- Environment variables
- In the code
[<pipeline_name>.destination.<destination name>.credentials]
<credential_option>="some_value"
export PIPELINE_NAME__DESTINATION__DESTINATION_NAME__CREDENTIALS__CREDENTIAL_VALUE="some_value"
import os
os.environ["PIPELINE_NAME__DESTINATION__DESTINATION_NAME__CREDENTIALS__CREDENTIAL_VALUE"] = "some_value"
Example
For example, if the source module is named pipedrive
and the source is defined as follows:
# pipedrive.py
@dlt.source
def deals(api_key: str = dlt.secrets.value):
pass
dlt
will search for the following names in this order:
sources.pipedrive.deals.api_key
sources.pipedrive.api_key
sources.api_key
api_key
You can use your pipeline name to have separate configurations for each pipeline in your project. All config values will be looked with the pipeline name first and then again without it.
[pipeline_name_1.sources.google_sheets.credentials]
client_email = "<client_email_1>"
private_key = "<private_key_1>"
project_id = "<project_id_1>"
[pipeline_name_2.sources.google_sheets.credentials]
client_email = "<client_email_2>"
private_key = "<private_key_2>"
project_id = "<project_id_2>"
Credential types
In most cases, credentials are just key-value pairs, but in some cases, the actual structure of credentials could be quite complex and support several ways of setting it up.
For example, to connect to a sql_database
source, you can either set up a connection string:
[sources.sql_database]
credentials="snowflake://user:password@service-account/database?warehouse=warehouse_name&role=role"
or set up all parameters of connection separately:
[sources.sql_database.credentials]
drivername="snowflake"
username="user"
password="password"
database = "database"
host = "service-account"
warehouse = "warehouse_name"
role = "role"
dlt
can work with both ways and convert one to another. To learn more about which credential types are supported, visit the complex credential types page.
Environment variables
dlt
prioritizes security by looking in environment variables before looking into the .toml files.
The format of lookup keys is slightly different from secrets files because for environment variables, all names are capitalized, and sections are separated with a double underscore "__"
. For example, to specify the Facebook Ads access token through environment variables, you would need to set up:
export SOURCES__FACEBOOK_ADS__ACCESS_TOKEN="<access_token>"
Check out the example of setting up credentials through environment variables.
To organize development and securely manage environment variables for credentials storage, you can use the python-dotenv to automatically load variables from an .env
file.
Vaults
Vault integration methods vary based on the vault type. Check out our example involving Google Cloud Secrets Manager. For other vault integrations, you are welcome to contact sales to learn about our building blocks for data platform teams.
secrets.toml and config.toml
The TOML config provider in dlt utilizes two TOML files:
config.toml
:
- Configs refer to non-sensitive configuration data. These are settings, parameters, or options that define the behavior of a data pipeline.
- They can include things like file paths, database hosts and timeouts, API URLs, performance settings, or any other settings that affect the pipeline's behavior.
- Accessible in code through
dlt.config.values
secrets.toml
:
- Secrets are sensitive information that should be kept confidential, such as passwords, API keys, private keys, and other confidential data.
- It's crucial to never hard-code secrets directly into the code, as it can pose a security risk.
- Accessible in code through
dlt.secrets.values
By default, the .gitignore
file in the project prevents secrets.toml
from being added to version control and pushed. However, config.toml
can be freely added to version control.
Location
The TOML provider always loads those files from the .dlt
folder, located relative to the current working directory.
For example, if your working directory is my_dlt_project
and your project has the following structure:
my_dlt_project:
|
pipelines/
|---- .dlt/secrets.toml
|---- google_sheets.py
and you run
python pipelines/google_sheets.py
then dlt
will look for secrets in my_dlt_project/.dlt/secrets.toml
and ignore the existing my_dlt_project/pipelines/.dlt/secrets.toml
.
If you change your working directory to pipelines
and run
python google_sheets.py
dlt
will look for my_dlt_project/pipelines/.dlt/secrets.toml
as (probably) expected.
The TOML provider also has the capability to read files from ~/.dlt/
(located in the user's home directory) in addition to the local project-specific .dlt
folder.
Structure
dlt
organizes sections in TOML files in a specific structure required by the injection mechanism.
Understanding this structure gives you more flexibility in setting credentials. For more details, see Toml files structure.
Custom Providers
You can use the CustomLoaderDocProvider
classes to supply a custom dictionary to dlt
for use
as a supplier of config
and secret
values. The code below demonstrates how to use a config stored in config.json
.
import dlt
from dlt.common.configuration.providers import CustomLoaderDocProvider
# create a function that loads a dict
def load_config():
with open("config.json", "rb") as f:
config_dict = json.load(f)
# create the custom provider
provider = CustomLoaderDocProvider("my_json_provider",load_config)
# register provider
dlt.config.register_provider(provider)
Check our an example for a yaml
based config provider that supports switchable profiles.
Examples
Setup both configurations and secrets
dlt
recognizes two types of data: secrets and configurations. The main difference is that secrets contain sensitive information,
while configurations hold non-sensitive information and can be safely added to version control systems like git.
This means you have more flexibility with configurations. You can set up configurations directly in the code,
but it is strongly advised not to do this with secrets.
You can put all configurations and credentials in the secrets.toml
if it's more convenient.
However, credentials cannot be placed in configs.toml
because dlt
doesn't look for them there.
Let's assume we have a notion source and filesystem destination:
- TOML config provider
- Environment variables
- In the code
# we can set up a lot in config.toml
# config.toml
[runtime]
log_level="INFO"
[destination.filesystem]
bucket_url = "s3://[your_bucket_name]"
[normalize.data_writer]
disable_compression=true
# but credentials should go to secrets.toml!
# secrets.toml
[source.notion]
api_key = "api_key"
[destination.filesystem.credentials]
aws_access_key_id = "ABCDEFGHIJKLMNOPQRST" # copy the access key here
aws_secret_access_key = "1234567890_access_key" # copy the secret access key here
# Environment variables are set up the same way both for configs and secrets
export RUNTIME__LOG_LEVEL="INFO"
export DESTINATION__FILESYSTEM__BUCKET_URL="s3://[your_bucket_name]"
export NORMALIZE__DATA_WRITER__DISABLE_COMPRESSION="true"
export SOURCE__NOTION__API_KEY="api_key"
export DESTINATION__FILESYSTEM__CREDENTIALS__AWS_ACCESS_KEY_ID="api_key"
export DESTINATION__FILESYSTEM__CREDENTIALS__AWS_SECRET_ACCESS_KEY="api_key"
import os
import dlt
# you can freely set up configuration directly in the code
# via env vars
os.environ["RUNTIME__LOG_LEVEL"] = "INFO"
os.environ["DESTINATION__FILESYSTEM__BUCKET_URL"] = "s3://[your_bucket_name]"
os.environ["NORMALIZE__DATA_WRITER__DISABLE_COMPRESSION"] = "true"
# or even directly to the dlt.config
dlt.config["runtime.log_level"] = "INFO"
dlt.config["destination.filesystem.bucket_url"] = "INFO"
dlt.config["normalize.data_writer.disable_compression"] = "true"
# but please, do not set up the secrets in the code!
# what you can do is reassign env variables:
os.environ["SOURCE__NOTION__API_KEY"] = os.environ.get("NOTION_KEY")
# or use a third-party credentials supplier
import botocore.session
credentials = AwsCredentials()
session = botocore.session.get_session()
credentials.parse_native_representation(session)
dlt.secrets["destination.filesystem.credentials"] = credentials
Google credentials for both source and destination
Let's assume we use the bigquery
destination and the google_sheets
source. They both use Google credentials and expect them to be configured under the credentials
key.
- If we create just a single
credentials
section like in here, the destination and source will share the same credentials.
- TOML config provider
- Environment variables
- In the code
[credentials]
client_email = "<client_email_both_for_destination_and_source>"
private_key = "<private_key_both_for_destination_and_source>"
project_id = "<project_id_both_for_destination_and_source>"
export CREDENTIALS__CLIENT_EMAIL="<client_email_both_for_destination_and_source>"
export CREDENTIALS__PRIVATE_KEY="<private_key_both_for_destination_and_source>"
export CREDENTIALS__PROJECT_ID="<project_id_both_for_destination_and_source>"
import os
# do not set up the secrets directly in the code!
# what you can do is reassign env variables
os.environ["CREDENTIALS__CLIENT_EMAIL"] = os.environ.get("GOOGLE_CLIENT_EMAIL")
os.environ["CREDENTIALS__PRIVATE_KEY"] = os.environ.get("GOOGLE_PRIVATE_KEY")
os.environ["CREDENTIALS__PROJECT_ID"] = os.environ.get("GOOGLE_PROJECT_ID")
- If we define sections as below, we'll keep the credentials separate
- TOML config provider
- Environment variables
- In the code
# google sheet credentials
[sources.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"
# bigquery credentials
[destination.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"
# google sheet credentials
export SOURCES__CREDENTIALS__CLIENT_EMAIL="<client_email>"
export SOURCES__CREDENTIALS__PRIVATE_KEY="<private_key>"
export SOURCES__CREDENTIALS__PROJECT_ID="<project_id>"
# bigquery credentials
export DESTINATION__CREDENTIALS__CLIENT_EMAIL="<client_email>"
export DESTINATION__CREDENTIALS__PRIVATE_KEY="<private_key>"
export DESTINATION__CREDENTIALS__PROJECT_ID="<project_id>"
import dlt
import os
# do not set up the secrets directly in the code!
# what you can do is reassign env variables
os.environ["DESTINATION__CREDENTIALS__CLIENT_EMAIL"] = os.environ.get("BIGQUERY_CLIENT_EMAIL")
os.environ["DESTINATION__CREDENTIALS__PRIVATE_KEY"] = os.environ.get("BIGQUERY_PRIVATE_KEY")
os.environ["DESTINATION__CREDENTIALS__PROJECT_ID"] = os.environ.get("BIGQUERY_PROJECT_ID")
# or set them to the dlt.secrets
dlt.secrets["sources.credentials.client_email"] = os.environ.get("SHEETS_CLIENT_EMAIL")
dlt.secrets["sources.credentials.private_key"] = os.environ.get("SHEETS_PRIVATE_KEY")
dlt.secrets["sources.credentials.project_id"] = os.environ.get("SHEETS_PROJECT_ID")
Now dlt
looks for destination credentials in the following order:
destination.bigquery.credentials --> Not found
destination.credentials --> Found
When looking for the source credentials:
sources.google_sheets_module.google_sheets_function.credentials --> Not found
sources.google_sheets_function.credentials --> Not found
sources.credentials --> Found
Credentials for several different sources and destinations
Let's assume we have several different Google sources and destinations. We can use full paths to organize the secrets.toml
file:
- TOML config provider
- Environment variables
- In the code
# google sheet credentials
[sources.google_sheets.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"
# google analytics credentials
[sources.google_analytics.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"
# bigquery credentials
[destination.bigquery.credentials]
client_email = "<client_email from services.json>"
private_key = "<private_key from services.json>"
project_id = "<project_id from services json>"
# google sheet credentials
export SOURCES__GOOGLE_SHEETS__CREDENTIALS__CLIENT_EMAIL="<client_email>"
export SOURCES__GOOGLE_SHEETS__CREDENTIALS__PRIVATE_KEY="<private_key>"
export SOURCES__GOOGLE_SHEETS__CREDENTIALS__PROJECT_ID="<project_id>"
# google analytics credentials
export SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__CLIENT_EMAIL="<client_email>"
export SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__PRIVATE_KEY="<private_key>"
export SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__PROJECT_ID="<project_id>"
# bigquery credentials
export DESTINATION__BIGQUERY__CREDENTIALS__CLIENT_EMAIL="<client_email>"
export DESTINATION__BIGQUERY__CREDENTIALS__PRIVATE_KEY="<private_key>"
export DESTINATION__BIGQUERY__CREDENTIALS__PROJECT_ID="<project_id>"
import os
import dlt
# do not set up the secrets directly in the code!
# what you can do is reassign env variables
os.environ["SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__CLIENT_EMAIL"] = os.environ.get("SHEETS_CLIENT_EMAIL")
os.environ["SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__PRIVATE_KEY"] = os.environ.get("ANALYTICS_PRIVATE_KEY")
os.environ["SOURCES__GOOGLE_ANALYTICS__CREDENTIALS__PROJECT_ID"] = os.environ.get("ANALYTICS_PROJECT_ID")
os.environ["DESTINATION__CREDENTIALS__CLIENT_EMAIL"] = os.environ.get("BIGQUERY_CLIENT_EMAIL")
os.environ["DESTINATION__CREDENTIALS__PRIVATE_KEY"] = os.environ.get("BIGQUERY_PRIVATE_KEY")
os.environ["DESTINATION__CREDENTIALS__PROJECT_ID"] = os.environ.get("BIGQUERY_PROJECT_ID")
# or set them to the dlt.secrets
dlt.secrets["sources.credentials.client_email"] = os.environ.get("SHEETS_CLIENT_EMAIL")
dlt.secrets["sources.credentials.private_key"] = os.environ.get("SHEETS_PRIVATE_KEY")
dlt.secrets["sources.credentials.project_id"] = os.environ.get("SHEETS_PROJECT_ID")
Credentials for several sources of the same type
Let's assume we have several sources of the same type, how can we separate them in the secrets.toml
? The recommended solution is to use different pipeline names for each source:
- TOML config provider
- Environment variables
- In the code
[pipeline_name_1.sources.sql_database]
credentials="snowflake://user1:password1@service-account/database1?warehouse=warehouse_name&role=role1"
[pipeline_name_2.sources.sql_database]
credentials="snowflake://user2:password2@service-account/database2?warehouse=warehouse_name&role=role2"
export PIPELINE_NAME_1_SOURCES__SQL_DATABASE__CREDENTIALS="snowflake://user1:password1@service-account/database1?warehouse=warehouse_name&role=role1"
export PIPELINE_NAME_2_SOURCES__SQL_DATABASE__CREDENTIALS="snowflake://user2:password2@service-account/database2?warehouse=warehouse_name&role=role2"
import os
import dlt
# do not set up the secrets directly in the code!
# what you can do is reassign env variables
os.environ["PIPELINE_NAME_1_SOURCES__SQL_DATABASE__CREDENTIALS"] = os.environ.get("SQL_CREDENTIAL_STRING_1")
# or set them to the dlt.secrets
dlt.secrets["pipeline_name_2.sources.sql_database.credentials"] = os.environ.get("SQL_CREDENTIAL_STRING_2")
Understanding the exceptions
If dlt
expects configuration of secrets value but cannot find it, it will output the ConfigFieldMissingException
.
Let's run the chess.py
example without providing the password:
$ CREDENTIALS="postgres://loader@localhost:5432/dlt_data" python chess.py
...
dlt.common.configuration.exceptions.ConfigFieldMissingException: Following fields are missing: ['password'] in configuration with spec PostgresCredentials
for field "password" config providers and keys were tried in the following order:
In Environment Variables key CHESS_GAMES__DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CHESS_GAMES__DESTINATION__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CHESS_GAMES__CREDENTIALS__PASSWORD was not found.
In secrets.toml key chess_games.destination.postgres.credentials.password was not found.
In secrets.toml key chess_games.destination.credentials.password was not found.
In secrets.toml key chess_games.credentials.password was not found.
In Environment Variables key DESTINATION__POSTGRES__CREDENTIALS__PASSWORD was not found.
In Environment Variables key DESTINATION__CREDENTIALS__PASSWORD was not found.
In Environment Variables key CREDENTIALS__PASSWORD was not found.
In secrets.toml key destination.postgres.credentials.password was not found.
In secrets.toml key destination.credentials.password was not found.
In secrets.toml key credentials.password was not found.
Please refer to https://dlthub.com/docs/general-usage/credentials for more information
It tells you exactly which paths dlt
looked at, via which config providers and in which order.
In the example above:
- First,
dlt
looked in a big sectionchess_games
, which is the name of the pipeline. - In each case, it starts with full paths and goes to the minimum path
credentials.password
. - First, it looks into environment variables, then in
secrets.toml
. It displays the exact keys tried. - Note that
config.toml
was skipped! It could not contain any secrets.