CodeQL

CodeQL 是一门声明式、基于逻辑编程的查询语言(QL家族)，被 Github/Semmle 用于对代码进行静态分析，它把源代码解析为一个关系型数据库(facts)，查询过程本质上是在这些关系上求解一阶逻辑公式。 编写 QL 注意事项：

一切都为"集合": 类型是值的集合，谓词是元组的集合。
from-where-select：不是循环而是求解。
变量需被有效绑定，否则编译器会报错"no bound to value"。

注意：CodeQL 中没有 ==，只有 = 。未赋值时 = 为"等式约束"（可视为绑定），已有绑定时等价于"比较是否相等"。

数据类型

CodeQL作为静态类型语言，每个变量必须声明类型，一个变量可同时属于多种类型。

类型	说明
int	整型
float	浮点型
string	字符串
boolean	布尔类型
date	日期型

日期型

表示公历的年、月、日、时、分、秒、毫秒，各字段都是整数，取值范围见官方文档。

cql

from date start,date end
where start = "21/04/2026".toDate() and
end = "01/05/2026".toDate()
select start.daysTo(end)

布尔型

cql

from boolean a,boolean b
where a=true and b=true
select a.booleanAnd(b)

字符串操作

cql

"hello".length()              // 5
"hello".toUpperCase()         // "HELLO"
"hello world".matches("%world%")   // true，支持 SQL LIKE 风格
"abc".regexpMatch(".b.")      // true，支持正则
"a,b,c".splitAt(",", 1)       // "b"

表达式与运算符

CodeQL 的条件部分是一阶逻辑攻击公式，类型为：

运算符	作用
and	与
or	或
not	否定（要求被否定部分内部变量已绑定）
implies	蕴含(A implies B 等价于 not A or B)
if A then B else C	条件表达式
forall(decl \| cond \| body)	全称量词
exists(decl \| cond \| body)	存在量词

exists

当不想在 from 中暴露某个中间变量时使用：

cql

from Method m
where exists(MethodCall call |
	call.getMethod() = m and 
	call.getEnclosingCallable().getName() = "transform"
)
select m

forall

表示"所有满足 X 的都满足 Y"：

cql

from Method m 
where m.getNumberOfParameters() > 0 and 
	forall(Parameter p | 
		p = m.getAParameter() | 
		p.getType().getName() = "int" 
	) 
select m, "所有参数都是 int"

Not

not 内部变量必须在外部已绑定，否则会出错。

cql

// 错误示例
from Method m
where not m.getName() = m2.getName()  // m2 未绑定，报错
select m

谓词

用来描述"参数与元组的关系"，首字母必须为小写。

谓词种类

非成员谓词：定义在类外，独立使用
成员谓词：定义在类内，通过 this 指定实例
特征谓词：类中与类同名的谓词，定义"哪些值属于这个类"

cql

import java

// 定义"危险的 sink 方法调用"这个概念
class DangerousSink extends MethodCall {
  string riskType;

  DangerousSink() {          // 特征谓词：什么算 DangerousSink
    this.getMethod().getName() = "invoke" and
    this.getMethod().getDeclaringType().getName() = "Method" and
    riskType = "反射调用"
    or
    this.getMethod().getName() = "exec" and
    this.getMethod().getDeclaringType().getName() = "Runtime" and
    riskType = "命令执行"
    or
    this.getMethod().getName() = "lookup" and
    riskType = "JNDI 查找"
  }

  string getRiskType() { result = riskType }  // 成员谓词：获取风险类型
}

// 使用时非常简洁
from DangerousSink sink
select sink,
  sink.getRiskType(),
  sink.getEnclosingCallable().getDeclaringType().getQualifiedName()

对比,没有 class 的写法：所有条件堆在 where 里，又长又难维护, class 的本质就是给一组满足条件的元素起个名字。

无结果谓词

以 predicate 开头，仅在 where 中使用，返回 true/false.

cql

predicate isCity(string city){
	city = "Beijing" or city = "Shanghai"
}

有结果谓词

以返回类型开头，可在 where 或 select 中使用：

cql

int addOne(int i){
	result = i+1 and i in [1..10]
}

关键字 result 是隐式返回变量，不需要写 return。

递归谓词

QL 支持谓词递归（本质是求最小不动点）,常见用来追踪方法调用链：

cql

predicate canReach(Callable src,Callable dest){
	//常见逻辑
	exists(MethodCall call|
		call.getEnclosingCallable() = src and
		call.getMethod() = dest
	)
	or 
	//递归逻辑
	exist(MethodCall mid |
		exist(MethodCall call|
			call.getEnclosingCallable() = src and
			call.getMethod() = dest
		)
		and canReach(mid,dest)
	)
}

基础情况：如果 src 方法体内有一个 MethodCall 直接调用了 dest，那 src 能到达 dest
递归情况：如果 src 调用了某个中间方法 mid，而 mid 能到达 dest，那 src 也能到达 dest

绑定集

谓词必须能在有限时间内求解，每个参数都要有有限的候选值。当你无法写出有限范围时，用 bindingset 声明"调用方会保证该参数已绑定"。

cql

// 错误：i 是无限的，报 "not bound to a value"  
int addOne(int i) { result = i + 1 and i > 0 }  
  
// 正确：声明 i 由调用方绑定  
bindingset[i]  
int addOne(int i) { result = i + 1 and i > 0 }

多个绑定集的语义：

cql

// 多行 bindingset = OR 关系：x 或 y 任一被绑定即可
bindingset[x]
bindingset[y]
predicate plusOne(int x, int y) { x + 1 = y }

// 同行 bindingset = AND 关系：x 和 y 必须同时被绑定
bindingset[x, y]
predicate plusOne(int x, int y) { x + 1 = y }

查询

SELECT

cql

[from] /* 变量声明 */
[where] /* 逻辑公式 */
select /* 表达式 */

常用关键字：

as <n>：为结果列命名，便于在 select 中复用
order by col [asc|desc]：排序

cql

from int x, int y
where x = 3 and y in [0 .. 2]
select x, y, x * y as product, "product: " + product

查询谓词

在非成员谓词前加 query 注解，它会在查询结果中独立输出，同时可被其他查询复用：

cql

query int getProduct(int x, int y) {
  x = 3 and y in [0 .. 2] and result = x * y
}

类

定义

cql

class ClassName [extends Parent1,Parent2,...]{
	//特征谓词(构造方法)+成员谓词(成员方法)+字段(成员变量)
}

参考案例:

cql

class OneTwoThree extends int {
  OneTwoThree() { //特征谓词
	  this = 1 or this = 2 or this = 3 
  }
  string getAString() { //成员谓词
	  result = "One, two or three: " + this.toString() 
  }
  predicate isEven() {  //无结果谓词
	  this = 2 
  }
}

字段

字段是在类中声明的变量，必须在特征谓词有初始化的动作。

cgl

class SmallInt extends int {
	SmallInt() { this in [1 .. 10]}
}

class DivisibleInt extedns SmallInt {
   SmallInt divisor;
   DivisibleInt(){ this % divisor = 0 }
   SmallInt getADivisor() { result = divisor }
}

参考案例

cgl

import java

// 定义"有危险 setter 的类"，字段记住是哪个 setter
class DangerousBean extends Class {
  Method dangerousSetter;

  DangerousBean() {
    dangerousSetter.getDeclaringType() = this and
    dangerousSetter.getName().matches("set%") and
    dangerousSetter.isPublic() and
    exists(MethodCall call |
      call.getEnclosingCallable() = dangerousSetter and
      call.getMethod().getName() = "invoke"
    )
  }

  Method getDangerousSetter() { result = dangerousSetter }
}

from DangerousBean bean
select bean.getQualifiedName(), bean.getDangerousSetter().getName()

抽象类与实现类

具体类：通过限定值域定义
抽象类：以 abstract 开头，只有其所有非抽象子类的并集才构成集合

cgl

abstract class SecuritySink extends MethodCall { }

class SqlInjectSink extends SecuritySink {
	SqlInjectSink(){
		this.getMethod().getDeclaringType().getASupertype*().getName("Statement") and this.getMethod().getName("execute%")
	}
}

class CommandInjectSink extends SecuritySink {
	CommandInjectSink(){
		this.getMethod().getDecrlaringType().getName("Runtime") and this.getMethod().getName("exec")
	}
} 
// SecuritySink的范围 = SqlInjectionSink ∪ CommandInjectionSink
from SecuritySink sink
select sink,"风险点"

代码解析:
1. this.getMethod() --> (MethodCall this).getMethod()
  - 获取方法调用表中的所有内容
2. getDeclaringType()
  - 获取方法所属的父类
3. getASupertype()/getASupertype*()/getASupertype+()
  - 没有特殊符号，只返回父类
  - *：返回当前类与所有祖先类
  - +: 返回祖先类

多重继承与重写

使用 override 重写成员谓语：

cql

	class OneTwoThree extends int {
		OneTwoThree(){ this in [1 .. 3] }
		string describe() { result = "a number" }
	}
	
	class SpecialOne extends OneTwoThree {
		SpecialOne() {this = 1}
		override string describe(){ result = "the special one" }
	}

类型兼容性

不能继承自身，不能继承 final 类
X extends Y 意味着 X 的值集合 ⊆ Y 的值集合
instanceof 用于类型检查，不创建子类型关系

模块

模块种类

文件模块：每个.ql/.qll 隐式定义同名模块
库模块: .qll 文件，不能含有 select 子句，用于被其它文件 import
查询模块: .ql 文件，必须含有 select 子句或者查询谓词
显式模块: 通过 module Name { ... } 在文件嵌套定义

参数化模块

这就是你在写污点分析时看到的 module ... implements DataFlow::ConfigSig, 要做的就是实现这个接口，告诉引擎"什么是 source，什么是 sink"：

cgl

import java
import semmle.code.java.dataflow.DataFlow

module MyConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {/* 定义 */ }
  predicate isSink(DataFlow::Node sink) {/* 定义 */ }
}
module MyFlow = TaintTracking::Global<MyConfig>;

AST 常用节点

Java-CodeQL 中最常用的类型：

类型	含义	常用方法
`Class`	类	`getName()`, `getQualifiedName()`, `getASupertype*()`, `getAMethod()`
`Method`	方法	`getName()`, `getDeclaringType()`, `getAParameter()`, `getNumberOfParameters()`
`Callable`	方法或构造函数	比 Method 更通用
`MethodCall`	方法调用	`getMethod()`, `getEnclosingCallable()`, `getAnArgument()`, `getQualifier()`
`Parameter`	方法参数	`getType()`, `getName()`
`Field`	字段	`getName()`, `getType()`, `getDeclaringType()`
`Expr`	表达式	所有表达式的父类
`VarAccess`	变量引用	`getVariable()`
`StringLiteral`	字符串字面量	`getValue()`

方法解释

java

class InvokerTransformer {
    public Object transform(Object input) {      // ← 这是 Callable（方法）
        Class cls = input.getClass();              // ← 这是 MethodCall（调用1）
        Method method = cls.getMethod(...);         // ← 这是 MethodCall（调用2）
        return method.invoke(input, ...);           // ← 这是 MethodCall（调用3）
    }
}

对于调用3 method.invoke(input, ...)：

call — 就是 method.invoke(input, ...) 这行调用本身
call.getMethod() — 返回 Method.invoke，即 被调用的方法
call.getEnclosingCallable() — 返回InvokerTransformer.transform，即 这行代码写在哪个方法里

数据流与污点追踪

	DataFlow	TaintTracking
追踪方式	精确值传递（x = y）	包含衍生数据（z = x + "abc"）
使用场景	硬编码密钥、精确值追踪	SQL 注入、XSS、命令注入等
引入方式	`DataFlow::Global<Config>`	`TaintTracking::Global<Config>`

全局污点分析

cql

import java
import semmle.code.java.dataflow.TaintTracking
import semmle.code.java.dataflow.FlowSources

module MyConfig implements DataFlow::ConfigSig {

  // Source：用户输入从哪来
  predicate isSource(DataFlow::Node source) {
    source instanceof RemoteFlowSource
  }

  // Sink：数据流到哪算危险
  predicate isSink(DataFlow::Node sink) {
    exists(MethodCall call |
      call.getMethod().getDeclaringType().getASupertype*().getName() = "Statement" and
      call.getMethod().getName().matches("execute%") and
      sink.asExpr() = call.getAnArgument()
    )
  }

  // 可选：自定义额外传播步骤（处理框架特有的数据传递）
  // predicate isAdditionalTaintStep(DataFlow::Node prev, DataFlow::Node succ) { }

  // 可选：定义净化点（经过安全处理后不再危险）
  // predicate isSanitizer(DataFlow::Node node) { }
}

module MyFlow = TaintTracking::Global<MyConfig>;

from DataFlow::Node source, DataFlow::Node sink
where MyFlow::flow(source, sink)
select sink, "来自 $@ 的用户输入流入了 SQL 查询", source, "用户输入"

逐行解释：

DataFlow::Node — 数据流节点，代表数据可能经过的每一个点
source instanceof RemoteFlowSource — CodeQL 内置定义，涵盖所有 HTTP 参数、Header 等外部输入
sink.asExpr() = call.getAnArgument() — 把 Node 当作表达式看，检查它是否是某个危险方法调用的参数
MyFlow::flow(source, sink) — 核心：询问"从 source 到 sink 有没有一条数据流路径"

元数据

标准 .ql 文件头:

cql

/**
 * @name Command injection via Runtime.exec
 * @description User-controlled input flows into Runtime.exec.
 * @kind path-problem
 * @problem.severity error
 * @security-severity 9.8
 * @precision high
 * @id java/command-injection-demo
 * @tags security
 *       external/cwe/cwe-078
 */

常见 @kind：

problem — 普通报警，select 至少两列（元素 + 描述）
path-problem — 带完整数据流路径的报警
table — 纯表格输出，用于调试

如果不加元数据或者 select 列数不对，VSCode 会报 Expected at least 2 columns 或 INVALID_RESULT_PATTERNS，加一列描述字符串即可解决。

CodeQL ​

数据类型 ​

日期型 ​

布尔型 ​

表达式与运算符 ​

exists ​

forall ​

Not ​

谓词 ​

谓词种类 ​

无结果谓词 ​

有结果谓词 ​

递归谓词 ​

绑定集 ​

查询 ​

SELECT ​

查询谓词 ​

类 ​

定义 ​

字段 ​

抽象类与实现类 ​

多重继承与重写 ​

类型兼容性 ​

模块 ​

模块种类 ​

参数化模块 ​

AST 常用节点 ​

方法解释 ​

数据流与污点追踪 ​

全局污点分析 ​

元数据 ​