#intern #azure #cloud

Related: Cloud computing | Az900 | Software engineering

Context: Learned and applied KQL during my internship to optimize Azure resource allocation.

Definition

Kusto is a query language whose primary purpose is to query large volumes of structured and semi-structured data for real-time analytics, making it a powerful tool for immediate data insights.

Notes

Syntax

Code

// Define a variable called 'cpuP95' that will store CPU performance data

// 'let' keyword creates a variable that can be reused later in the query

let cpuP95 = Perf  // 'Perf' is a standard table in azure monitor that contains performance counter data

    // These are commented-out filters for specific Azure subscriptions (production and test environments)

    // The '//' at the start means these lines are disabled/commented out

    // | where _SubscriptionId == "26bcf52b-a494-4ed3-a25a-0f6e862c6957" // opt-intpriv-prod-01

    // | where _SubscriptionId == "fdf972ed-bbb8-4045-a619-7d3d8e5a8b6c" // opt-intpriv-test-01

    // This is a commented-out filter that would include only specific database computers

    // | where Computer contains "qnxtdb" and Computer !startswith "he1prhpqnxtdb00"

    // The '|' symbol is called 'pipe' - it passes data from one operation to the next

    // 'where' filters rows based on conditions. '!contains' means "does NOT contain"

    | where Computer !contains "qnxt"  // Filter OUT computers that contain "qnxt" in their name

        // 'and' combines multiple conditions - ALL must be true

        and Computer !contains "hpdw"  // AND exclude computers with "hpdw" in their name

        and Computer !contains "db"    // AND exclude computers with "db" in their name

    // Filter to only include CPU performance counters specifically

    // This ensures we only get processor time data, not memory or disk data

    | where ObjectName == "Processor Information" and CounterName == "% Processor Time"

    // 'extend' adds new columns to our data without removing existing ones

    // 'split()' function breaks apart text using a separator (in this case, the dot '.')

    | extend ComputerNameParts = split(Computer, '.')  // Split computer name by dots

    // 'tostring()' converts data to text format

    // [0] gets the first element from the split array (the server name part)

    | extend ServerName = tostring(ComputerNameParts[0])

    // 'summarize' groups data and performs calculations on each group

    // 'percentile(value, 95)' finds the 95th percentile (a high-water mark of usage)

    // 'round()' rounds decimal numbers to whole numbers

    // 'by ServerName' means group the results by each unique server name

    | summarize cpuP95 = round(percentile(CounterValue, 95)) by ServerName;

  

// Define another variable for memory commit percentage data

let memcommitP95 = Perf  // Again using the Perf table, but for memory data this time

    // Same commented-out subscription filters as above

    // | where _SubscriptionId == "26bcf52b-a494-4ed3-a25a-0f6e862c6957" // opt-intpriv-prod-01

    // | where _SubscriptionId == "fdf972ed-bbb8-4045-a619-7d3d8e5a8b6c" // opt-intpriv-test-01

    // | where Computer contains "qnxtdb" and Computer !startswith "he1prhpqnxtdb00"

    // Same computer name filters as the CPU query above

    | where Computer !contains "qnxt"

        and Computer !contains "hpdw"

        and Computer !contains "db" // and Computer !contains "prodmd"

    // This time we're filtering for memory performance counters instead of CPU

    | where ObjectName == "Memory" and CounterName == "% Committed Bytes In Use"

    // Same server name extraction logic as above

    | extend ComputerNameParts = split(Computer, '.')

    | extend ServerName = tostring(ComputerNameParts[0])

    // Calculate 95th percentile of memory commit percentage for each server

    | summarize memcommitP95 = round(percentile(CounterValue, 95)) by ServerName;

  

// Define a third variable that gets detailed memory and CPU information

let memAndCpuDetails = InsightsMetrics  // 'InsightsMetrics' is a different table with more detailed VM metrics

    // Same commented-out subscription filters

    // | where _SubscriptionId == "26bcf52b-a494-4ed3-a25a-0f6e862c6957" // opt-intpriv-prod-01

    // | where _SubscriptionId == "fdf972ed-bbb8-4045-a619-7d3d8e5a8b6c" // opt-intpriv-test-01

    // | where Computer contains "qnxtdb" and Computer !startswith "he1prhpqnxtdb00"

    // Same computer name filters as above

    | where Computer !contains "qnxt"

        and Computer !contains "hpdw"

        and Computer !contains "db" // and Computer !contains "prodmd"

    // Filter for data that comes from Azure Monitor for VMs

    | where Origin == "vm.azm.ms"

    // 'in' operator checks if a value is in a list of options

    // We only want Memory and Processor namespace data

    | where Namespace in ("Memory", "Processor")

    // Same server name extraction as above

    | extend ComputerNameParts = split(Computer, '.')

    | extend ServerName = tostring(ComputerNameParts[0])

    // 'todynamic()' converts JSON text into a usable object structure

    | extend Tags = todynamic(Tags)  // Tags contain metadata about the VM

    // 'iif()' is like an if-then-else statement: iif(condition, value_if_true, value_if_false)

    // 'toreal()' converts text to decimal numbers

    // Extract total memory size from VM metadata, but only for Memory namespace records

    | extend TotalMemoryMB = iif(Namespace == "Memory", toreal(Tags["vm.azm.ms/memorySizeMB"]), dynamic(null))

    // Convert memory from megabytes to gigabytes (divide by 1024)

    // 'isnotempty()' checks if a value exists and isn't empty

    | extend TotalMemoryGB = iif(isnotempty(TotalMemoryMB), TotalMemoryMB / 1024, dynamic(null))

    // Round memory to nearest even number of GB for cleaner reporting

    // 'ceiling()' rounds up to next whole number

    | extend TotalMemGB = iif(isnotempty(TotalMemoryGB), ceiling(TotalMemoryGB / 2) * 2, dynamic(null))

    // Extract number of CPU cores from VM metadata, but only for Processor namespace records

    // 'toint()' converts text to whole numbers

    | extend totalCpus = iif(Namespace == "Processor", toint(Tags["vm.azm.ms/totalCpus"]), dynamic(null))

    // Calculate available memory as a percentage of total memory

    // 'Val' column contains the actual metric value

    | extend AvailableMemoryPercentage = iif(Namespace == "Memory", (toreal(Val) / TotalMemoryMB) * 100.0, dynamic(null))

    // Calculate used memory percentage (opposite of available)

    // If we have 20% available, then 80% is used

    | extend UsedMemPercentage = iif(isnotempty(AvailableMemoryPercentage), round(100 - AvailableMemoryPercentage), dynamic(null))

    // Group by server and calculate summary statistics

    // 'max()' gets the highest value in each group (since these values should be the same for all records from one server)

    | summarize

        memP95 = round(percentile(UsedMemPercentage, 95)),  // 95th percentile of memory usage

        TotalMemGB = max(TotalMemGB),                       // Total memory size

        totalCpus = max(totalCpus)                          // Total number of CPUs

        by ServerName;  // Group all calculations by server name

  

// Now combine all the data together and analyze it

cpuP95  // Start with the CPU data we calculated earlier

// 'join' combines data from two sources based on matching values

// 'kind=inner' means only include servers that exist in BOTH datasets

| join kind=inner (memAndCpuDetails) on ServerName  // Match servers by name

  

// Create a new column that mimics the memory commit data format

// Multiply by 1.0 to ensure it's treated as a decimal number

| extend memUseP95 = memP95 * 1.0 // mimic memcommitP95 = 100 (not available in InsightsMetrics)

  

// Create an optimization recommendation column using nested if-then-else logic

| extend ['Optimized?'] = iif(

    // First condition: if server has only 2 CPUs, it's at minimum specs

    totalCpus == 2,

    "✅ minimum",           // Label for 2-CPU servers

    iif(

        // Second condition: low CPU and memory usage with decent specs = over-provisioned

        cpuP95 < 50 and memUseP95 < 50 and totalCpus >= 4 and TotalMemGB >= 14,

        "❌ no",           // This server could be downsized

        iif(

            // Third condition: high CPU but low memory, and good memory-to-CPU ratio

            cpuP95 >= 40 and memUseP95 < 40 and (TotalMemGB / totalCpus) >= 8,

            "⚠️ less-memory",  // Could reduce memory allocation

            iif(

                // Fourth condition: low CPU but high memory, and low memory-to-CPU ratio

                cpuP95 < 50 and memUseP95 >= 50 and (TotalMemGB / totalCpus) <= 8,

                "⚠️ less-cpu",     // Could reduce CPU allocation

                "✅ yes"           // Everything else is considered well-optimized

            )

        )

    )

)

  

// 'project' selects which columns to include in the final output

// This is like 'SELECT' in SQL - it chooses what to display

| project ServerName, ['Optimized?'], cpuP95, memUseP95, totalCpus, TotalMemGB

  

// 'sort' arranges the results in order

// 'asc' means ascending (lowest to highest)

// Sort first by memory usage, then by CPU usage (both lowest first)

| sort by memUseP95 asc, cpuP95 asc 

Steps to make my own script

  1. Define Scope: Define time period and environments you want to target
  2. Choose data sources: Perf table for basics , InsightsMetrics for details
  3. Set thresholds: Set what percent of usage is considered low or high
  4. Build filters: Customize filters to isolate the exact resource you want to examine
  5. Calculate Percentiles: 95th percentile for realistic peak usage
  6. Create Logic:  If/then rules for optimization recommendations
  7. Sort Output: Show biggest opportunities first