Custom SQL
The Custom SQL tool allows you to write custom SQL queries for data analysis directly within your workflows. This powerful feature enables advanced data processing that goes beyond GOAT's built-in tools.
This is an advanced feature intended for users with SQL knowledge. Incorrect queries may cause workflows to fail or produce unexpected results. If you need help writing SQL queries, you can use AI assistants to help generate and explain the code.
Overview
The Custom SQL tool connects to GOAT's DuckDB backend, giving you direct access to query your datasets using SQL syntax. You can:
- Execute complex analytical queries
- Join multiple datasets
- Perform aggregations and statistical calculations
- Create derived datasets with custom logic
- Access advanced spatial functions
Using Custom SQL
Adding the Tool
Writing SQL Queries
In the configuration panel, you'll find a SQL editor where you can write your custom query:
SELECT
h.*,
p.population_density,
ST_Distance(h.geom, p.geom) AS distance_to_center
FROM input_1 h
JOIN input_2 p ON ST_Intersects(h.geom, p.geom)
WHERE p.population_density > 1000
ORDER BY distance_to_center
Input References
- input_1, input_2, input_3...: Reference your connected datasets using these table names
- The number corresponds to the connection order on the node
- You can connect up to 3 input datasets per Custom SQL node
Available Functions
The Custom SQL tool supports standard SQL functions plus spatial operations:
Spatial Functions:
ST_Distance()- Calculate distances between geometriesST_Intersects()- Check if geometries intersectST_Within()- Test if geometry is within anotherST_Buffer()- Create buffers around geometriesST_Area()- Calculate geometry areaST_Length()- Calculate line length
Analytical Functions:
AVG(),SUM(),COUNT()- Statistical aggregationsPERCENTILE_CONT()- Calculate percentilesROW_NUMBER(),RANK()- Window functionsCASE WHEN- Conditional logic
Query Validation
Preview button to see the first few rows of results.Examples
Basic Filtering and Selection
-- Select buildings within a certain area
SELECT building_type, height, geom
FROM input_1
WHERE building_type = 'residential'
AND height > 10
Spatial Join Analysis
-- Find all amenities within 500m of transit stops
SELECT
a.name as amenity_name,
a.amenity_type,
t.stop_name,
ST_Distance(a.geom, t.geom) as distance
FROM input_1 a
JOIN input_2 t ON ST_DWithin(a.geom, t.geom, 500)
ORDER BY distance
Aggregation by Area
-- Count points by administrative area
SELECT
admin.district_name,
COUNT(points.*) as point_count,
admin.geom
FROM input_1 points
RIGHT JOIN input_2 admin
ON ST_Within(points.geom, admin.geom)
GROUP BY admin.district_name, admin.geom
Best Practices
- Use spatial indexes by including geometric predicates in WHERE clauses
- Limit results during development with
LIMIT 100 - Test with small datasets first, then scale up
- Ensure geometry columns are properly formatted for spatial operations
- Cast data types explicitly when joining different datasets
- Check for NULL values in critical columns
Query Optimization
Use Spatial Predicates: Always include spatial filters like ST_DWithin() when possible to utilize spatial indexes.
Column Selection: Select only the columns you need rather than using SELECT *.
Proper Joins: Use appropriate join types (INNER, LEFT, RIGHT) based on your analysis needs.
Error Handling
Common issues and solutions:
- "Table not found": Ensure input datasets are properly connected
- "Column doesn't exist": Check column names in your input datasets
- "Geometry error": Verify geometry columns are valid and properly formatted
- "Timeout": Break complex queries into smaller steps or increase timeout
Output and Integration
The Custom SQL tool creates a new temporary layer containing your query results. You can:
- Connect the output to other workflow tools for further analysis
- Add an export node to save results as a permanent dataset
- Use the results in visualizations and styling
Custom SQL queries support workflow variables using the {{@variable_name}} syntax for parameterized queries.
Limitations
- Maximum of 3 input datasets per Custom SQL node
- Queries must return at least one geometry column for mapping
- Some advanced DuckDB functions may not be available
- Query execution time is limited by the configured timeout
For more complex analysis requirements, consider using multiple Custom SQL nodes or combining with other workflow tools.