# kJQ Filter Language Guide for LLMs

## What is kJQ?

kJQ is a filter-based language for processing Kafka message data. It's based on jq but optimized for Kafka workloads. A
kJQ program is a "filter" that takes Kafka message input and produces a boolean result (true/false) to determine if the
message should be included.

## Core Syntax Patterns

### Basic Structure

```
input | transform1 | transform2 | comparison
```

### Common Kafka Message Structure

```json
{
  "key": {
    "user_id": "12345",
    "session": "abc"
  },
  "value": {
    "amount": 100,
    "status": "pending",
    "timestamp": "2023-06-20T10:30:00Z"
  },
  "size": 156,
  "key-size": 32,
  "value-size": 124
}
```

## Field Access and Indexing

### Object Field Access

```kjq
.key.field_name          # Access key fields
.value.field_name        # Access value fields (most common)
.value.nested.field      # Navigate nested objects
.["field-with-dashes"]   # Bracket notation for special characters
."field.with.dots"       # Quoted notation for special characters
```

### Array Indexing

```kjq
.value.items[0]          # First element (zero-indexed)
.value.items[-1]         # Last element
.value.items[-2]         # Second-to-last element
```

### Array/String Slicing

```kjq
.value.transaction_id[0:3]     # Characters 0-2 ("TXN" from "TXN12345")
.value.logs[-10:]              # Last 10 characters
.value.data[:5]                # First 5 characters
.value.items[1:4]              # Array elements 1-3
```

## Complete Transform Reference

### Type Conversion Transforms

```kjq
to-long          # Convert to long integer
to-double        # Convert to double-precision number
to-string        # Convert to string
to-uuid          # Convert string to UUID type
parse-json       # Parse JSON string into data structure
```

### Date/Time Transforms

```kjq
from-date        # Convert date string to standardized format
```

### String Transforms

```kjq
upper-case       # Convert to uppercase
lower-case       # Convert to lowercase  
trim             # Remove whitespace from both ends
ltrim            # Remove whitespace from left end
rtrim            # Remove whitespace from right end
```

### Numeric Transforms

```kjq
min              # Minimum value from array
max              # Maximum value from array
floor            # Round down to nearest integer
ceil             # Round up to nearest integer
length           # Length of string, array, or object
```

### Array Transforms

```kjq
reverse          # Reverse array element order
sort             # Sort array elements ascending
unique           # Remove duplicate elements
first            # First element of array
last             # Last element of array
flatten          # Flatten nested arrays
```

### Object Transforms

```kjq
keys             # Get array of object keys
values           # Get array of object values
```

### Utility Transforms

```kjq
is-empty         # Returns true if empty (null, "", [], {})
```

## Complete Function Reference

### String Functions

```kjq
startswith("prefix")     # Test if string starts with prefix
endswith("suffix")       # Test if string ends with suffix  
contains("substring")    # Test if string contains substring
inside("larger_string")  # Test if current value is in larger string
test("regex_pattern")    # Test if string matches regex
split("delimiter")       # Split string into array
join("separator")        # Join array into string with separator
```

### Collection Functions

```kjq
has("key")               # Test if object has key or array has index
within("val1", "val2")   # Test if current value is in provided list
map(.selector)            # Transforms each element in a collection by applying the given selector expression
select(.selector)         # Filters elements where the selector expression produces a non-null, non-false result (only supported within `map`)
```

## Operators and Precedence

### All Operators (by precedence, highest to lowest)

1. **Field access**: `.foo`, `.[0]`, `.foo.bar`
2. **Function calls**: `length`, `contains("x")`
3. **Arithmetic**: `*`, `/`, `%`, `mod`, `quot`, `rem`
4. **Addition/Subtraction**: `+`, `-`
5. **Comparisons**: `<`, `<=`, `>`, `>=`, `==`, `!=`
6. **Logical AND**: `and`
7. **Logical OR**: `or`
8. **Alternative**: `//` (null coalescing)

### Mathematical Operators

```kjq
+        # Addition
-        # Subtraction  
*        # Multiplication
/        # Division
%        # Modulo
mod      # Modulo (alternative syntax)
quot     # Integer division (quotient only)
rem      # Remainder operation
```

## Time and Duration Operations

### Current Time

```kjq
now                      # Current timestamp
now - pt1h              # One hour ago
now + pt30m             # 30 minutes from now
```

### Duration Syntax (ISO 8601)

```kjq
pt5m     # 5 minutes
pt1h     # 1 hour
pt2d     # 2 days  
pt1w     # 1 week
pt30s    # 30 seconds
```

### Date Operations

```kjq
.value.created_at | from-date > now - pt24h
.value.start_date | from-date >= #dt "2023-01-01T00:00:00Z"
```

## Extended Data Types and Literals

### Tagged Literals

```kjq
#dt "2025-01-01T10:30:00Z"                    # Date literal
#uuid "550e8400-e29b-41d4-a716-446655440000"  # UUID literal
```

### Data Type Support

- Basic JSON: `null`, `boolean`, `number`, `string`, `array`, `object`
- Extended: `date`, `double`, `uuid`, `keyword` (Clojure-style like `:topic`)

## Kafka-Specific Features

### Record Metadata Access

```kjq
.size > 1000             # Total serialized record size
.key-size < 100          # Key size in bytes  
.value-size >= 500       # Value size in bytes
.key == null             # Records with null keys
```

## Comprehensive Examples

### Basic Value Filtering

```kjq
# Simple comparisons
.value.amount > 100
.value.status == "approved"
.value.user_id != null

# String operations
.value.email | contains("@company.com")
.value.error_code | startswith("ERR")
.value.log_message | endswith("COMPLETE")
```

### Complex Business Logic

```kjq
# E-commerce order validation
(.value.order.total | to-double >= 50.00) and 
(.value.order.items | length >= 2) and 
(.value.customer.tier | within("gold", "platinum")) and 
(.value.shipping.country | within("US", "CA"))

# Financial transaction monitoring
(.value.amount | to-double > 10000) and 
(.value.transaction_type == "wire_transfer") and
(.value.created_at | from-date > now - pt1h) and
(.value.risk_score | to-long <= 3)

# IoT sensor alerting  
(.value.temperature | to-double > 85.0) and
(.value.sensor_location | startswith("datacenter")) and
(.value.readings | length >= 3) and
(.value.last_calibration | from-date < now - pt7d)
```

### String Processing Pipelines

```kjq
# Clean and validate user input
.value.comment | trim | lower-case | length > 10 and 
.value.comment | contains("spam") | not

# Extract and validate transaction IDs  
.value.transaction_ref | split("-") | .[0] == "TXN" and
.value.transaction_ref[4:8] | test("^[0-9]{4}$")

# Email domain validation
.value.email | split("@") | .[1] | lower-case | 
endswith(".com") and .value.email | test("^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$")
```

### Array and Object Analysis

```kjq
# Tag-based filtering
.value.tags | sort | join(",") | contains("urgent,production")

# Configuration validation
.value.config | keys | length == 5 and
.value.config | values | .[2] | to-long > 1000

# Multi-level nested access
.value.events[0].metadata.severity | within("high", "critical") and
.value.events | length >= 2
```

### Date Range and Duration Filtering

```kjq
# Recent activity (last hour)
.value.last_seen | from-date > now - pt1h

# Business hours filtering
.value.created_at | from-date | from-date[11:16] >= "09:00" and
.value.created_at | from-date | from-date[11:16] <= "17:00"

# Event duration analysis
(.value.end_time | from-date) - (.value.start_time | from-date) > pt30m
```

### Record Size and Performance Filtering

```kjq
# Large record detection
.size > 1000 and .value.payload | length > 500

# Key distribution analysis
.key-size > .value-size and .key | keys | length > 10

# Efficient small record processing
.size < 100 and .value | is-empty | not
```

### Null Handling and Fallbacks

```kjq
# Safe navigation with fallbacks
.value.user.premium_status // false and
.value.user.subscription_level // "basic" | within("premium", "enterprise")

# Complex fallback chains
.value.primary_email // .value.backup_email // .value.contact.email |
contains("@") and .value.user_verified // false
```

### Map and select 

```kjq
# Checking one item in an array contains user_123 
.value.events | map(.user_id) | contains("user_123")

# Checking that at least one element contains a key named error-code
.value.events | map(select(.error_code)) | is-empty | not
```

## Key Differences from Standard jq

### What's Different

- **Function naming**: kebab-case (`to-double` not `tonumber`)
- **Extended types**: Native UUID, date, double support
- **Time operations**: Built-in `now` and duration syntax
- **Kafka metadata**: Access to `.size`, `.key-size`, `.value-size`
- **Simplified feature set**: No variable binding, assignments, or complex generators

### What's Missing from jq

- Variable assignment (`as $var`)
- Update operations (`|=`, `+=`)
- Path expressions (`path()`, `paths`)
- Advanced generators (`range()`, `until()`)
- Complex object manipulation (`with_entries()`)
- Stream processing (`tostream()`, `fromstream()`)

## LLM Generation Guidelines

When generating kJQ filters:

1. **Start simple** - Build from basic field access
2. **Use proper names** - Always use kJQ function names (kebab-case)
3. **Add parentheses** - Group complex expressions for clarity
4. **Handle nulls** - Use `//` for safe fallbacks
5. **Convert types** - Use `to-double`, `to-long`, `to-string` as needed
6. **Think pipelines** - Chain operations with `|`
7. **Focus on boolean results** - Filters must evaluate to true/false
