Monitoring as Code with Newrelic and Terraform

HasanB
3 min readApr 18, 2022

Newrelic offers rich observability capabilities. If you are using Newrelic to explore a monitoring solution, you have very likely created at least one dashboard

Large organisations typically have multiple dashboards to monitor several environments, workflows, APMs, etc. The NewRelic UI isn't exactly the fastest UI out there. When we first adopted Newrelic, we began duplicating the dashboards for multiple environments and edited the NRQL behind each which was a tedious process. This made me wonder how we could automate the rollout of dashboards and alerts across our infrastructure.

Luckily, Newrelic has a sophisticated Terraform integration!

The Newrelic documentation is limited to creating single page single chart dashboards. As a complex organisation, we needed multi-page, multi-chart and multi widget dashboards, along with alerts setup which made the challenge exciting.

Firstly your init file you need to declare your API key and provider version. Anything above 2.x should be good

#Initializeterraform {
# Require Terraform version 0.13.x (recommended)
required_version = "~> 1.0.3"
# Require the latest 2.x version of the New Relic provider
required_providers {
newrelic = {
source = "newrelic/newrelic"
version = "~> 2.21"
}
}
}
#NR provider detailsprovider "newrelic" {
account_id = 7-digit-key # Your New Relic account ID
api_key = "NRAK-YOURKEY" # Your New Relic user key
region = "US" # US or EU (defaults to US)
}

Run terraform init which will install the provider and create the tf state files

To create a new Multipage MultiWidget Dashboard:

Creating simple dashboards is easy, creating multi-chart ones is a headache, more so because you need to figure out the row, column, height, width, widget type

Luckily I’ve figured out the perfect combination to create a 3x3 chart groups, and page structure so you don't have to pull off your hair

You need to be careful while choosing the widget_type, else the dashboard will not render

Available types are widget_line, widget_table, widget_bar and so on!

#Create a New Dashboardresource "newrelic_one_dashboard" "dash_au_prod" {
name = "Your Dashboard Name"
page {
name = "Page 1"
widget_line {
title = "CPU %"
row = 1
column = 1
height = 3
width = 4
nrql_query {
query = "SELECT latest(`host.cpuPercent`) FROM Metric FACET `host.fullHostname` SINCE 30 MINUTES AGO TIMESERIES where host.fullHostname like '%prod_cluster%' AND apmApplicationNames ='|java_apm_1|'"
}
}
widget_line {
title = "Memory %"
row = 1
column = 5
height = 3
width = 4
nrql_query {
query = "NRQL Query here"
}
}
widget_line {
title = "Web Requests"
row = 1
column = 9
height = 3
width = 4
nrql_query {
query = "NRQL Query Here"
}
}
} #page closepage {
name = "Page 2
widget_pie {
title = "APM Transactions"
row = 1
column = 1
height = 3
width = 4
nrql_query {
query = "SELECT count(*) FROM Transaction WHERE appName = 'dotnet_core' FACET `host` LIMIT 10 SINCE 30 minutes ago EXTRAPOLATE"
}
}

widget_table {
title = "Host load"
row = 1
column = 5
height = 3
width = 4
nrql_query {
query = "NRQL Query here"
}
}
widget_table {
title = "AU PROD load Average"
row = 1
column = 9
height = 3
width = 4
nrql_query {
query = "Query Here"
}
}
} #pageclose

} #dashboard closure

Creating Alerts Policies and Alerts

Well if you also wished for a solution to create alerts on the fly, I won't leave you high and dry

To create alerts for an APM, specify your exact APM name, create a new alert policy resource and assign alerts to that resource, below is an example that should get you going

Create New Alert Policyresource "newrelic_alert_policy" "load_test_alert_policy" {
name = "Load Testing Alerts Policy"
}
Dot_NET_Exceptionsresource "newrelic_nrql_alert_condition" "load_test_dot_net_exceptions" {
policy_id = newrelic_alert_policy.load_test_alert_policy.id
type = "static"
name = "System.Net.WebException alert"
description = "system.Net.WebException count > 100 in 5 min window"
enabled = true
value_function = "single_value"
violation_time_limit_seconds = 86400
nrql {
query = "SELECT count(*) FROM TransactionError FACET `error.class` WHERE appId = YOUR_APM_ID AND `error.expected` IS not true AND `error.class` = 'System.Net.WebException'"
evaluation_offset = 3
}
critical {
operator = "above"
threshold = 100
threshold_duration = 300
threshold_occurrences = "ALL"
}
}

Hopefully, you should now be able to move a step closer to Observability as code!

--

--