Syntax-Aware Fill-in-the-Middle (SAFIM) is a benchmark for evaluating Large Language Models (LLMs) on the code Fill-in-the-Middle (FIM) task. SAFIM has three subtasks: Algorithmic Block Completion, Control-Flow Expression Completion, and API Function Call Completion. SAFIM is sourced from code submitted from April 2022 to January 2023 to minimize the impact of data contamination on evaluation results.
Algorithmic Block Completion
Calculate max path sum in grid,
only right or down moves allowed
n, m = len(a), len(a[0])
f = np.zeros((n + 1, m + 1))
for i in range(1, n + 1):
for j in range(1, m + 1):
v = max(f[i-1,j], f[i,j-1])
f[i, j] += v
print(f[n, m])
Control-Flow Expression Completion
Calculate (a ^ b) mod m for
large positive integers a, b, m
result = 1
while b > 0:
if b % 2:
result = (result * a) % m
a = (a * a) % m
b //= 2
print(result)
API Function Call Completion
Define word embedding & learned
positional embedding layers
d_model = args.model_dim
n_words = args.vocab_size
max_len = args.max_src_len
self.word_emb = nn.Embedding(
n_words, d_model)
self.pos_emb = nn.Embedding(
max_len, d_model)
Each example includes a problem description and corresponding code file in Python, Java, C++, or C#. The challenge involves a masked AST structure, highlighted by an underline, within each code example, where the AST structure is a code block, control-flow expression, or an API function call depending on the subtask category. Models are tasked with completing these masked parts.
# | Name | Algorithmic | Control | API | Average | Cutoff |
---|