Related: Cloud computing | Az900 | Software engineering
Context: Learned and applied KQL during my internship to optimize Azure resource allocation.
Definition
Kusto is a query language whose primary purpose is to query large volumes of structured and semi-structured data for real-time analytics, making it a powerful tool for immediate data insights.
Notes
Syntax
- The ‘let’ keyword allows you to define a variable that can be used later in the query.
- | pipe operator. It passes the results from one line of code to the next
- Perf is a standard table in Azure monitor logs that collects performance data
-
Data Filters:
- Most important part of KQL they allow you to narrow down massive oceans of data into small specific subsets.
- Primary filtering operator is where, it filters a table to the subset of rows which the condition is true syntax looks like:
TableName | where ColumnName Operator Value-
Operators:
- They can be mathematical or string based
-
Mathematical:
- == Equals (case-sensitive)
- != Not equal to
-
Greater than `
-
= Greater than or equal to `
- < Less than `
- <= Less than or equal to `
- =~ Equals (case-insensitive) `
- !~ Not equal to (case-insensitive)
-
String:
- contains Finds a substring (case-sensitive). "Administrator" contains "admin".
- !contains Does not contain a substring (case-sensitive). `
- contains_cs Explicitly case-sensitive contains. (Same as contains). `
- containsi Case-insensitive contains. `
- startswith Checks if a string begins with a value (case-sensitive). `
- endswith Checks if a string ends with a value (case-sensitive). `
- has Finds a whole term (case-insensitive). Faster than contains. Use this if you are looking for a whole word, not part of one. Message has "error" will match "An error occurred" but not "There was an errorcode".
- matches regex Matches a regular expression. Most powerful, but slower.
-
Data Transformation:
Extend: add a new column to our table without making any chnages to the tableSplit (): breaks text apart-
tostring(),toreal(),toint()- data type conversions
-
Data analysis:
Summarize: groups data and calculates statisticspercentile()- finds statistical percentilesround(), ceiling()- mathematical functionsjoin- combines datasets
-
Advanced commands:
iif()- conditional logic (if-then-else)todynamic()- JSON parsingisnotempty()- checks for data existence
-
Formatting Output:
project- selects columns to displaysort- arranges results in order
Code
// Define a variable called 'cpuP95' that will store CPU performance data
// 'let' keyword creates a variable that can be reused later in the query
let cpuP95 = Perf // 'Perf' is a standard table in azure monitor that contains performance counter data
// These are commented-out filters for specific Azure subscriptions (production and test environments)
// The '//' at the start means these lines are disabled/commented out
// | where _SubscriptionId == "26bcf52b-a494-4ed3-a25a-0f6e862c6957" // opt-intpriv-prod-01
// | where _SubscriptionId == "fdf972ed-bbb8-4045-a619-7d3d8e5a8b6c" // opt-intpriv-test-01
// This is a commented-out filter that would include only specific database computers
// | where Computer contains "qnxtdb" and Computer !startswith "he1prhpqnxtdb00"
// The '|' symbol is called 'pipe' - it passes data from one operation to the next
// 'where' filters rows based on conditions. '!contains' means "does NOT contain"
| where Computer !contains "qnxt" // Filter OUT computers that contain "qnxt" in their name
// 'and' combines multiple conditions - ALL must be true
and Computer !contains "hpdw" // AND exclude computers with "hpdw" in their name
and Computer !contains "db" // AND exclude computers with "db" in their name
// Filter to only include CPU performance counters specifically
// This ensures we only get processor time data, not memory or disk data
| where ObjectName == "Processor Information" and CounterName == "% Processor Time"
// 'extend' adds new columns to our data without removing existing ones
// 'split()' function breaks apart text using a separator (in this case, the dot '.')
| extend ComputerNameParts = split(Computer, '.') // Split computer name by dots
// 'tostring()' converts data to text format
// [0] gets the first element from the split array (the server name part)
| extend ServerName = tostring(ComputerNameParts[0])
// 'summarize' groups data and performs calculations on each group
// 'percentile(value, 95)' finds the 95th percentile (a high-water mark of usage)
// 'round()' rounds decimal numbers to whole numbers
// 'by ServerName' means group the results by each unique server name
| summarize cpuP95 = round(percentile(CounterValue, 95)) by ServerName;
// Define another variable for memory commit percentage data
let memcommitP95 = Perf // Again using the Perf table, but for memory data this time
// Same commented-out subscription filters as above
// | where _SubscriptionId == "26bcf52b-a494-4ed3-a25a-0f6e862c6957" // opt-intpriv-prod-01
// | where _SubscriptionId == "fdf972ed-bbb8-4045-a619-7d3d8e5a8b6c" // opt-intpriv-test-01
// | where Computer contains "qnxtdb" and Computer !startswith "he1prhpqnxtdb00"
// Same computer name filters as the CPU query above
| where Computer !contains "qnxt"
and Computer !contains "hpdw"
and Computer !contains "db" // and Computer !contains "prodmd"
// This time we're filtering for memory performance counters instead of CPU
| where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"
// Same server name extraction logic as above
| extend ComputerNameParts = split(Computer, '.')
| extend ServerName = tostring(ComputerNameParts[0])
// Calculate 95th percentile of memory commit percentage for each server
| summarize memcommitP95 = round(percentile(CounterValue, 95)) by ServerName;
// Define a third variable that gets detailed memory and CPU information
let memAndCpuDetails = InsightsMetrics // 'InsightsMetrics' is a different table with more detailed VM metrics
// Same commented-out subscription filters
// | where _SubscriptionId == "26bcf52b-a494-4ed3-a25a-0f6e862c6957" // opt-intpriv-prod-01
// | where _SubscriptionId == "fdf972ed-bbb8-4045-a619-7d3d8e5a8b6c" // opt-intpriv-test-01
// | where Computer contains "qnxtdb" and Computer !startswith "he1prhpqnxtdb00"
// Same computer name filters as above
| where Computer !contains "qnxt"
and Computer !contains "hpdw"
and Computer !contains "db" // and Computer !contains "prodmd"
// Filter for data that comes from Azure Monitor for VMs
| where Origin == "vm.azm.ms"
// 'in' operator checks if a value is in a list of options
// We only want Memory and Processor namespace data
| where Namespace in ("Memory", "Processor")
// Same server name extraction as above
| extend ComputerNameParts = split(Computer, '.')
| extend ServerName = tostring(ComputerNameParts[0])
// 'todynamic()' converts JSON text into a usable object structure
| extend Tags = todynamic(Tags) // Tags contain metadata about the VM
// 'iif()' is like an if-then-else statement: iif(condition, value_if_true, value_if_false)
// 'toreal()' converts text to decimal numbers
// Extract total memory size from VM metadata, but only for Memory namespace records
| extend TotalMemoryMB = iif(Namespace == "Memory", toreal(Tags["vm.azm.ms/memorySizeMB"]), dynamic(null))
// Convert memory from megabytes to gigabytes (divide by 1024)
// 'isnotempty()' checks if a value exists and isn't empty
| extend TotalMemoryGB = iif(isnotempty(TotalMemoryMB), TotalMemoryMB / 1024, dynamic(null))
// Round memory to nearest even number of GB for cleaner reporting
// 'ceiling()' rounds up to next whole number
| extend TotalMemGB = iif(isnotempty(TotalMemoryGB), ceiling(TotalMemoryGB / 2) * 2, dynamic(null))
// Extract number of CPU cores from VM metadata, but only for Processor namespace records
// 'toint()' converts text to whole numbers
| extend totalCpus = iif(Namespace == "Processor", toint(Tags["vm.azm.ms/totalCpus"]), dynamic(null))
// Calculate available memory as a percentage of total memory
// 'Val' column contains the actual metric value
| extend AvailableMemoryPercentage = iif(Namespace == "Memory", (toreal(Val) / TotalMemoryMB) * 100.0, dynamic(null))
// Calculate used memory percentage (opposite of available)
// If we have 20% available, then 80% is used
| extend UsedMemPercentage = iif(isnotempty(AvailableMemoryPercentage), round(100 - AvailableMemoryPercentage), dynamic(null))
// Group by server and calculate summary statistics
// 'max()' gets the highest value in each group (since these values should be the same for all records from one server)
| summarize
memP95 = round(percentile(UsedMemPercentage, 95)), // 95th percentile of memory usage
TotalMemGB = max(TotalMemGB), // Total memory size
totalCpus = max(totalCpus) // Total number of CPUs
by ServerName; // Group all calculations by server name
// Now combine all the data together and analyze it
cpuP95 // Start with the CPU data we calculated earlier
// 'join' combines data from two sources based on matching values
// 'kind=inner' means only include servers that exist in BOTH datasets
| join kind=inner (memAndCpuDetails) on ServerName // Match servers by name
// Create a new column that mimics the memory commit data format
// Multiply by 1.0 to ensure it's treated as a decimal number
| extend memUseP95 = memP95 * 1.0 // mimic memcommitP95 = 100 (not available in InsightsMetrics)
// Create an optimization recommendation column using nested if-then-else logic
| extend ['Optimized?'] = iif(
// First condition: if server has only 2 CPUs, it's at minimum specs
totalCpus == 2,
"✅ minimum", // Label for 2-CPU servers
iif(
// Second condition: low CPU and memory usage with decent specs = over-provisioned
cpuP95 < 50 and memUseP95 < 50 and totalCpus >= 4 and TotalMemGB >= 14,
"❌ no", // This server could be downsized
iif(
// Third condition: high CPU but low memory, and good memory-to-CPU ratio
cpuP95 >= 40 and memUseP95 < 40 and (TotalMemGB / totalCpus) >= 8,
"⚠️ less-memory", // Could reduce memory allocation
iif(
// Fourth condition: low CPU but high memory, and low memory-to-CPU ratio
cpuP95 < 50 and memUseP95 >= 50 and (TotalMemGB / totalCpus) <= 8,
"⚠️ less-cpu", // Could reduce CPU allocation
"✅ yes" // Everything else is considered well-optimized
)
)
)
)
// 'project' selects which columns to include in the final output
// This is like 'SELECT' in SQL - it chooses what to display
| project ServerName, ['Optimized?'], cpuP95, memUseP95, totalCpus, TotalMemGB
// 'sort' arranges the results in order
// 'asc' means ascending (lowest to highest)
// Sort first by memory usage, then by CPU usage (both lowest first)
| sort by memUseP95 asc, cpuP95 asc
Steps to make my own script
- Define Scope: Define time period and environments you want to target
- Choose data sources: Perf table for basics , InsightsMetrics for details
- Set thresholds: Set what percent of usage is considered low or high
- Build filters: Customize filters to isolate the exact resource you want to examine
- Calculate Percentiles: 95th percentile for realistic peak usage
- Create Logic: If/then rules for optimization recommendations
- Sort Output: Show biggest opportunities first