Method Alternatives

This page documents alternative extraction approaches evaluated during development. These runs were not used in the final dataset, but are retained here for transparency and comparison.

Final dataset: regex1-2025-06-19 (regex-gated mentions_cyber plus LLM labeling). See Validation for the metrics used in the published results.

Method Comparison

This comparison reports overall accuracy for different processing modes/runs using the same ground truth and sentence-matching cache (accuracy = TP / (TP+FP+FN)).

Run	Precision	Recall	F1	Accuracy	TP	FP	FN	GT	AI
regex_test	64.5%	85.3%	73.5%	58.1%	151	83	26	177	234
regex1-2025-06-19	64.4%	84.7%	73.2%	57.7%	150	83	27	177	233
analysis_openai	22.6%	59.9%	32.8%	19.6%	106	364	71	177	470
analysis_gemini	12.8%	74.6%	21.8%	12.3%	132	900	45	177	1,032

Method Comparison by Tag

Per-tag precision/recall/F1/accuracy for each run (limited to the four core labels; accuracy = TP / (TP+FP+FN)).

Run	Tag	Precision	Recall	F1	Accuracy	TP	FP	FN	GT	AI
regex_test	mentions_cyber	64.8%	85.8%	73.8%	58.5%	151	82	25	176	233
regex_test	mentions_board	53.8%	60.9%	57.1%	40.0%	14	12	9	23	26
regex_test	regulatory_reference	62.5%	50.0%	55.6%	38.5%	5	3	5	10	8
regex_test	specificity	0.0%	0.0%	0.0%	0.0%	0	1	0	0	1
regex1-2025-06-19	mentions_cyber	64.2%	84.7%	73.0%	57.5%	149	83	27	176	232
regex1-2025-06-19	mentions_board	47.2%	73.9%	57.6%	40.5%	17	19	6	23	36
regex1-2025-06-19	regulatory_reference	71.4%	50.0%	58.8%	41.7%	5	2	5	10	7
regex1-2025-06-19	specificity	0.0%	0.0%	0.0%	0.0%	0	29	0	0	29
analysis_openai	mentions_cyber	25.2%	59.7%	35.4%	21.5%	105	312	71	176	417
analysis_openai	mentions_board	20.6%	56.5%	30.2%	17.8%	13	50	10	23	63
analysis_openai	regulatory_reference	16.7%	50.0%	25.0%	14.3%	5	25	5	10	30
analysis_openai	specificity	0.0%	0.0%	0.0%	0.0%	0	173	0	0	173
analysis_gemini	mentions_cyber	12.8%	75.0%	21.9%	12.3%	132	897	44	176	1,029
analysis_gemini	mentions_board	12.5%	87.0%	21.9%	12.3%	20	140	3	23	160
analysis_gemini	regulatory_reference	8.0%	80.0%	14.5%	7.8%	8	92	2	10	100
analysis_gemini	specificity	0.0%	0.0%	0.0%	0.0%	0	634	0	0	634