Commit Graph

11960 Commits

Author SHA1 Message Date
Chris Lu
eec9558925 Update postgres-examples/README.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-04 08:13:35 -07:00
chrislu
2e4ba5b2fc redirect GetUnflushedMessages to brokers hosting the topic partition 2025-09-04 08:10:19 -07:00
chrislu
19a3da757f Update README.md 2025-09-04 08:08:40 -07:00
chrislu
31d5960f00 int overflow 2025-09-04 08:08:36 -07:00
chrislu
cd928f9f38 heap sort the data sources 2025-09-04 08:08:23 -07:00
chrislu
6fcc573709 fix: Improve test stability for date/time functions
**Problem:**
- CURRENT_TIMESTAMP test had timing race condition that could cause flaky failures
- CURRENT_DATE test could fail if run exactly at midnight boundary
- Tests were too strict about timing precision without accounting for system variations

**Root Cause:**
- Test captured before/after timestamps and expected function result to be exactly between them
- No tolerance for clock precision differences, NTP adjustments, or system timing variations
- Date boundary race condition around midnight transitions

**Solution:**
 **CURRENT_TIMESTAMP test**: Added 100ms tolerance buffer to account for:
  - Clock precision differences between time.Now() calls
  - System timing variations and NTP corrections
  - Microsecond vs nanosecond precision differences

 **CURRENT_DATE test**: Enhanced to handle midnight boundary crossings:
  - Captures date before and after function call
  - Accepts either date value in case of midnight transition
  - Prevents false failures during overnight test runs

**Testing:**
- Verified with repeated test runs (5x iterations) - all pass consistently
- Full test suite passes - no regressions introduced
- Tests are now robust against timing edge cases

**Impact:**
🚀 **Eliminated flaky test failures** while maintaining function correctness validation
🔧 **Production-ready testing** that works across different system environments
 **CI/CD reliability** - tests won't fail due to timing variations
2025-09-04 07:47:17 -07:00
chrislu
2b35cca9bd refactor: Split sql_functions.go into smaller, focused files
**File Structure Before:**
- sql_functions.go (850+ lines)
- sql_functions_test.go (1,205+ lines)

**File Structure After:**
- function_helpers.go (105 lines) - shared utility functions
- arithmetic_functions.go (205 lines) - arithmetic operators & math functions
- datetime_functions.go (170 lines) - date/time functions & constants
- string_functions.go (335 lines) - string manipulation functions
- arithmetic_functions_test.go (560 lines) - tests for arithmetic & math
- datetime_functions_test.go (370 lines) - tests for date/time functions
- string_functions_test.go (270 lines) - tests for string functions

**Benefits:**
 Better organization by functional domain
 Easier to find and maintain specific function types
 Smaller, more manageable file sizes
 Clear separation of concerns
 Improved code readability and navigation
 All tests passing - no functionality lost

**Total:** 7 focused files (1,455 lines) vs 2 monolithic files (2,055+ lines)

This refactoring improves maintainability while preserving all functionality.
2025-09-04 06:56:06 -07:00
chrislu
179a7b446e feat: Add comprehensive string functions with extensive tests
Implemented String Functions:
- LENGTH: Get string length (supports all value types)
- UPPER/LOWER: Case conversion
- TRIM/LTRIM/RTRIM: Whitespace removal (space, tab, newline, carriage return)
- SUBSTRING: Extract substring with optional length (SQL 1-based indexing)
- CONCAT: Concatenate multiple values (supports mixed types, skips nulls)
- REPLACE: Replace all occurrences of substring
- POSITION: Find substring position (1-based, 0 if not found)
- LEFT/RIGHT: Extract leftmost/rightmost characters
- REVERSE: Reverse string with proper Unicode support

Key Features:
- Robust type conversion (string, int, float, bool, bytes)
- Unicode-safe operations (proper rune handling in REVERSE)
- SQL-compatible indexing (1-based for SUBSTRING, POSITION)
- Comprehensive error handling with descriptive messages
- Mixed-type support (e.g., CONCAT number with string)

Helper Functions:
- valueToString: Convert any schema_pb.Value to string
- valueToInt64: Convert numeric values to int64

Comprehensive test suite with 25+ test cases covering:
- All string functions with typical use cases
- Type conversion scenarios (numbers, booleans)
- Edge cases (empty strings, null values, Unicode)
- Error conditions and boundary testing

All tests passing 
2025-09-04 00:21:17 -07:00
chrislu
25b07fda6c feat: Add DATE_TRUNC function with comprehensive tests
- Implement comprehensive DATE_TRUNC function supporting:
  - Time precisions: microsecond, millisecond, second, minute, hour
  - Date precisions: day, week, month, quarter, year, decade, century, millennium
  - Support both singular and plural forms (e.g., 'minute' and 'minutes')
- Enhanced date/time parsing with proper timezone handling:
  - Assume local timezone for non-timezone string formats
  - Support UTC formats with explicit timezone indicators
  - Consistent behavior between parsing and truncation
- Comprehensive test suite with 11 test cases covering:
  - All supported precisions from microsecond to year
  - Multiple input types (TimestampValue, string dates)
  - Edge cases (null values, invalid precisions)
  - Timezone consistency validation

All tests passing 
2025-09-04 00:18:31 -07:00
chrislu
ac69d6e5c7 feat: Add date/time functions CURRENT_DATE, CURRENT_TIMESTAMP, EXTRACT with comprehensive tests
- Implement CURRENT_DATE returning YYYY-MM-DD format
- Add CURRENT_TIMESTAMP returning TimestampValue with microseconds
- Add CURRENT_TIME returning HH:MM:SS format
- Add NOW() as alias for CURRENT_TIMESTAMP
- Implement comprehensive EXTRACT function supporting:
  - YEAR, MONTH, DAY, HOUR, MINUTE, SECOND
  - QUARTER, WEEK, DOY (day of year), DOW (day of week)
  - EPOCH (Unix timestamp)
- Support multiple input formats:
  - TimestampValue (microseconds)
  - String dates (multiple formats)
  - Unix timestamps (int64 seconds)
- Comprehensive test suite with 15+ test cases covering:
  - All date/time constants
  - Extract from different value types
  - Error handling for invalid inputs
  - Timezone handling

All tests passing 
2025-09-04 00:16:22 -07:00
chrislu
cc3ac76304 feat: Add mathematical functions ROUND, CEIL, FLOOR, ABS with comprehensive tests
- Implement ROUND with optional precision parameter
- Add CEIL function for rounding up to nearest integer
- Add FLOOR function for rounding down to nearest integer
- Add ABS function for absolute values with type preservation
- Support all numeric types (int32, int64, float32, double)
- Comprehensive test suite with 20+ test cases covering:
  - Positive/negative numbers
  - Integer/float type preservation
  - Precision handling for ROUND
  - Null value error handling
  - Edge cases (zero, large numbers)

All tests passing 
2025-09-04 00:14:51 -07:00
chrislu
32bd48ffb8 feat: Add basic arithmetic operators (+, -, *, /, %) with comprehensive tests
- Implement EvaluateArithmeticExpression with support for all basic operators
- Handle type conversions between int, float, string, and boolean
- Add proper error handling for division/modulo by zero
- Include 14 comprehensive test cases covering all edge cases
- Support mixed type arithmetic (int + float, string numbers, etc.)

All tests passing 
2025-09-04 00:13:51 -07:00
chrislu
bdf3f7caa9 fix 2025-09-04 00:07:06 -07:00
chrislu
a17062218c fix 2025-09-03 22:57:47 -07:00
chrislu
6dcade043b code reuse 2025-09-03 21:55:32 -07:00
Chris Lu
623a278a0f Update SQL_FEATURE_PLAN.md
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-03 21:55:12 -07:00
Chris Lu
5adea57224 Update weed/util/log_buffer/log_buffer.go
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2025-09-03 21:54:51 -07:00
chrislu
d192536376 fix 2025-09-03 21:44:27 -07:00
chrislu
eb03d05c97 fmt 2025-09-03 17:58:11 -07:00
chrislu
bdce5439d8 fixes 2025-09-03 17:57:06 -07:00
chrislu
1db1206827 fix splitting multiple SQLs 2025-09-03 17:47:24 -07:00
chrislu
ea758d0d9f remove sleep 2025-09-03 17:43:54 -07:00
chrislu
69e6902072 handling errors 2025-09-03 17:42:30 -07:00
chrislu
4060ea34a9 Update SQL_FEATURE_PLAN.md 2025-09-03 17:42:15 -07:00
chrislu
191bad0a21 timeout client connections 2025-09-03 15:49:27 -07:00
chrislu
323193cf8c no more mysql 2025-09-03 15:34:39 -07:00
chrislu
bec567598f fix tests, avoid panic 2025-09-03 10:27:50 -07:00
chrislu
48a9bee3b8 fix describe issue 2025-09-03 10:16:19 -07:00
chrislu
50040a68bb fix 2025-09-03 09:54:31 -07:00
chrislu
c10a0ba2fd fmt 2025-09-03 08:11:32 -07:00
chrislu
72d332a352 feat: Add window function foundation with timestamp support
Added comprehensive foundation for SQL window functions with timestamp analytics:

Core Window Function Types:
- WindowSpec with PartitionBy and OrderBy support
- WindowFunction struct for ROW_NUMBER, RANK, LAG, LEAD
- OrderByClause for timestamp-based ordering
- Extended SelectStatement to support WindowFunctions field

Timestamp Analytics Functions:
 ApplyRowNumber() - ROW_NUMBER() OVER (ORDER BY timestamp)
 ExtractYear() - Extract year from TIMESTAMP logical type
 ExtractMonth() - Extract month from TIMESTAMP logical type
 ExtractDay() - Extract day from TIMESTAMP logical type
 FilterByYear() - Filter records by timestamp year

Foundation for Advanced Window Functions:
- LAG/LEAD for time-series access to previous/next values
- RANK/DENSE_RANK for temporal ranking
- FIRST_VALUE/LAST_VALUE for window boundaries
- PARTITION BY support for grouped analytics

This enables sophisticated time-series analytics like:
- SELECT *, ROW_NUMBER() OVER (ORDER BY timestamp) FROM user_events WHERE EXTRACT(YEAR FROM timestamp) = 2024
- Trend analysis over time windows
- Session analytics with LAG/LEAD functions
- Time-based ranking and percentiles

Ready for production time-series analytics with proper timestamp logical type support! 🚀
2025-09-03 07:33:31 -07:00
chrislu
699e2f4413 feat: Add logical type support to SQL query engine
Extended SQL engine to handle new Parquet logical types:
- Added TimestampValue comparison support (microsecond precision)
- Added DateValue comparison support (days since epoch)
- Added DecimalValue comparison support with string conversion
- Added TimeValue comparison support (microseconds since midnight)
- Enhanced valuesEqual(), valueLessThan(), valueGreaterThan() functions
- Added decimalToString() helper for precise decimal-to-string conversion
- Imported math/big for arbitrary precision decimal handling

The SQL engine can now:
-  Compare TIMESTAMP values for filtering (e.g., WHERE timestamp > 1672531200000000000)
-  Compare DATE values for date-based queries (e.g., WHERE birth_date >= 12345)
-  Compare DECIMAL values for precise financial calculations
-  Compare TIME values for time-of-day filtering

Next: Add YEAR(), MONTH(), DAY() extraction functions for date analytics.
2025-09-03 07:29:03 -07:00
chrislu
3570027656 feat: Enable publishers to use Parquet logical types
Enhanced MQ publishers to utilize the new logical types:
- Updated convertToRecordValue() to use TimestampValue instead of string RFC3339
- Added DateValue support for birth_date field (days since epoch)
- Added DecimalValue support for precise_amount field with configurable precision/scale
- Enhanced UserEvent struct with PreciseAmount and BirthDate fields
- Added convertToDecimal() helper using big.Rat for precise decimal conversion
- Updated test data generator to produce varied birth dates (1970-2005) and precise amounts

Publishers now generate structured data with proper logical types:
-  TIMESTAMP: Microsecond precision UTC timestamps
-  DATE: Birth dates as days since Unix epoch
-  DECIMAL: Precise amounts with 18-digit precision, 4-decimal scale

Successfully tested with PostgreSQL integration - all topics created with logical type data.
2025-09-03 07:26:36 -07:00
chrislu
ec1e74a6e8 feat: Add Parquet logical types to mq_schema.proto
Added support for Parquet logical types in SeaweedFS message queue schema:
- TIMESTAMP: UTC timestamp in microseconds since epoch with timezone flag
- DATE: Date as days since Unix epoch (1970-01-01)
- DECIMAL: Arbitrary precision decimal with configurable precision/scale
- TIME: Time of day in microseconds since midnight

These types enable advanced analytics features:
- Time-based filtering and window functions
- Date arithmetic and year/month/day extraction
- High-precision numeric calculations
- Proper time zone handling for global deployments

Regenerated protobuf Go code with new scalar types and value messages.
2025-09-03 07:18:58 -07:00
chrislu
d60c542ecc feat: Replace pg_query_go with lightweight SQL parser (no CGO required)
- Remove github.com/pganalyze/pg_query_go/v6 dependency to avoid CGO requirement
- Implement lightweight SQL parser for basic SELECT, SHOW, and DDL statements
- Fix operator precedence in WHERE clause parsing (handle AND/OR before comparisons)
- Support INTEGER, FLOAT, and STRING literals in WHERE conditions
- All SQL engine tests passing with new parser
- PostgreSQL integration tests can now build without CGO

The lightweight parser handles the essential SQL features needed for the
SeaweedFS query engine while maintaining compatibility and avoiding CGO
dependencies that caused Docker build issues.
2025-09-03 07:11:18 -07:00
chrislu
88d86374ea fix: Enable CGO in Docker build for pg_query_go dependency
The pg_query_go library requires CGO to be enabled as it wraps the libpg_query C library.
Added gcc and musl-dev dependencies to the Docker build for proper compilation.
2025-09-03 00:59:11 -07:00
chrislu
4d9de40c5c fmt 2025-09-03 00:48:09 -07:00
chrislu
42661ac110 fix tests 2025-09-03 00:47:08 -07:00
chrislu
991247facf fix tests 2025-09-03 00:40:03 -07:00
chrislu
e3e369c264 change to pg_query_go 2025-09-03 00:10:47 -07:00
chrislu
ba4a8b91d5 fmt 2025-09-02 22:31:53 -07:00
chrislu
59d6806146 fix empty spaces and coercion 2025-09-02 22:30:52 -07:00
Chris Lu
f29dd385cc Merge branch 'master' into add-sql-querying 2025-09-02 22:14:21 -07:00
chrislu
3fa7670557 fix todo 2025-09-02 22:12:47 -07:00
chrislu
687c5d6bfd fix tests 2025-09-02 21:21:59 -07:00
chrislu
e14a316aeb use schema instead of inferred result types 2025-09-02 20:59:13 -07:00
chrislu
316d1cdda7 address some comments 2025-09-02 19:58:41 -07:00
chrislu
a7eb178cec Update engine.go 2025-09-02 18:37:31 -07:00
chrislu
60066a6a4c read broker, logs, and parquet files 2025-09-02 18:15:26 -07:00
chrislu
59ec4eb68a address comments 2025-09-02 17:37:52 -07:00