Best-of-N Analysis: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Main Grid RE-USE CC v1 CC v2 Acting Challenge

[Prompts 0-1] · Prompts 2-3 · Prompts 4-5 · Prompts 6-7 · Prompts 8-9

Ranking Method:

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#	Key	Score Formula	Text Prompt(s)
0	standard	Content Enjoyment	—
1	clap_lq	cos(audio, quality_text)	"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2	clap_sq	cos(audio, quality_text)	"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3	clap_lp	cos(audio, prompt)	Original DramaBox prompt
4	clap_sp	cos(audio, prompt)	Original DramaBox prompt
5	v1_nat_L	cos(audio, nat)	"natural, spontaneous, lifelike speech with genuine emotion"
6	v2_auth_L	cos(audio, auth)	"authentic, emotionally truthful, deeply felt voice performance"
7	v3_pro_L	cos(audio, pro)	"professional studio recording, crystal clear high-fidelity audio"
8	v4_expr_L	cos(audio, expr)	"expressive, dynamic voice acting with rich emotional range"
9	v5_cine_L	cos(audio, cine)	"immersive cinematic narration, compelling storytelling"
10	v6_nat_S	cos(audio, nat)	"natural, spontaneous, lifelike speech with genuine emotion"
11	v7_auth_S	cos(audio, auth)	"authentic, emotionally truthful, deeply felt voice performance"
12	v8_pro_S	cos(audio, pro)	"professional studio recording, crystal clear high-fidelity audio"
13	v9_nr_L	cos(audio, nat) − cos(audio, rob)	+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14	v10_ac_L	cos(audio, auth) − cos(audio, cheap)	+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15	v11_pd_L	cos(audio, pro) − cos(audio, dist)	+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16	v12_ef_L	cos(audio, expr) − cos(audio, flat)	+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17	v13_ff_L	cos(audio, full_pos) − cos(audio, full_neg)	+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18	v14_wr_L	cos(audio, warm) − cos(audio, rob)	+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19	v15_nr_S	cos(audio, nat) − cos(audio, rob)	+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20	v16_ac_S	cos(audio, auth) − cos(audio, cheap)	+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21	v17_pd_S	cos(audio, pro) − cos(audio, dist)	+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22	v18_ef_S	cos(audio, expr) − cos(audio, flat)	+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23	v19_ff_S	cos(audio, full_pos) − cos(audio, full_neg)	+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24	v20_wr_S	cos(audio, warm) − cos(audio, rob)	+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25	v21_san_L	cos(audio, sanitized_prompt)	Quoted speech removed (Large)
26	v22_san_S	cos(audio, sanitized_prompt)	Quoted speech removed (Small)
27	v23_snr_L	cos(audio, sanitized) − cos(audio, neg_san)	Sanitized / − "robotic, distorted, uncanny" (Large)
28	v24_snr_S	cos(audio, sanitized) − cos(audio, neg_san)	Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method	N=5	N=10	N=25	N=50	N=100	Gain N=5→100	Knee Point
Standard: (1−WER) × Content Enjoyment	3.9211	4.0197	4.0497	4.1281	4.1523	+0.2311	N=25
VoiceCLAP-Large × Quality Text	0.9132	0.9378	0.9521	0.9581	0.9605	+0.0473	N=50
VoiceCLAP-Small × Quality Text	0.7602	0.7738	0.7914	0.8037	0.8143	+0.0542	N=50
VoiceCLAP-Large × Prompt Match	1.3985	1.4249	1.4389	1.4555	1.4595	+0.0610	N=25
VoiceCLAP-Small × Prompt Match	0.8512	0.8702	0.8706	0.8918	0.9013	+0.0501	N=25
v1 Natural (Large)	0.9764	1.0031	1.0191	1.0278	1.0301	+0.0537	N=50
v2 Authentic (Large)	1.0138	1.0394	1.0529	1.0635	1.0638	+0.0500	N=25
v3 Professional (Large)	0.8590	0.8849	0.8977	0.9053	0.9067	+0.0476	N=25
v4 Expressive (Large)	1.0251	1.0429	1.0563	1.0740	1.0753	+0.0502	N=25
v5 Cinematic (Large)	0.9964	1.0109	1.0247	1.0376	1.0394	+0.0430	N=25
v6 Natural (Small)	0.7862	0.8159	0.8379	0.8518	0.8621	+0.0759	N=50
v7 Authentic (Small)	0.7493	0.7736	0.8021	0.8069	0.8180	+0.0687	N=50
v8 Professional (Small)	0.6313	0.6456	0.6537	0.6698	0.6767	+0.0454	N=25
v9 Natural−Robotic (Large)	1.7559	1.7926	1.8109	1.8245	1.8275	+0.0716	N=25
v10 Authentic−Cheap (Large)	1.8424	1.8812	1.9048	1.9199	1.9217	+0.0793	N=25
v11 Professional−Distorted (Large)	1.7442	1.7871	1.8136	1.8248	1.8271	+0.0828	N=50
v12 Expressive−Flat (Large)	1.7568	1.7898	1.8107	1.8367	1.8419	+0.0851	N=25
v13 FullPos−FullNeg (Large)	1.7593	1.7959	1.8217	1.8302	1.8325	+0.0732	N=25
v14 Warm−Robotic (Large)	1.6689	1.7077	1.7273	1.7379	1.7454	+0.0765	N=25
v15 Natural−Robotic (Small)	1.8332	1.8758	1.9142	1.9200	1.9358	+0.1026	N=50
v16 Authentic−Cheap (Small)	1.8533	1.8982	1.9543	1.9610	1.9857	+0.1324	N=50
v17 Professional−Distorted (Small)	1.7020	1.7347	1.7577	1.7740	1.7744	+0.0725	N=25
v18 Expressive−Flat (Small)	1.5764	1.6263	1.6467	1.6868	1.7039	+0.1275	N=25
v19 FullPos−FullNeg (Small)	1.6394	1.6768	1.7053	1.7219	1.7244	+0.0849	N=50
v20 Warm−Robotic (Small)	1.7189	1.7620	1.8005	1.8227	1.8289	+0.1100	N=50
v21 Sanitized Prompt (Large)	1.1563	1.1783	1.1981	1.2061	1.2133	+0.0571	N=50
v22 Sanitized Prompt (Small)	0.8343	0.8474	0.8450	0.8684	0.8791	+0.0448	N=25
v23 Sanitized−Uncanny (Large)	1.9570	1.9875	2.0213	2.0312	2.0425	+0.0856	N=50
v24 Sanitized−Uncanny (Small)	1.7907	1.8339	1.8565	1.9014	1.9077	+0.1170	N=25

Diminishing Returns — All Methods Overlaid

Marginal Improvement per Additional Candidate

Method	N=5→10	N=10→25	N=25→50	N=50→100
Standard: (1−WER) × Content Enjoyment	0.01971/cand (2.5%)	0.00200/cand (0.7%)	0.00313/cand (1.9%)	0.00048/cand (0.6%)
VoiceCLAP-Large × Quality Text	0.00492/cand (2.7%)	0.00095/cand (1.5%)	0.00024/cand (0.6%)	0.00005/cand (0.2%)
VoiceCLAP-Small × Quality Text	0.00272/cand (1.8%)	0.00118/cand (2.3%)	0.00049/cand (1.6%)	0.00021/cand (1.3%)
VoiceCLAP-Large × Prompt Match	0.00529/cand (1.9%)	0.00093/cand (1.0%)	0.00067/cand (1.2%)	0.00008/cand (0.3%)
VoiceCLAP-Small × Prompt Match	0.00381/cand (2.2%)	0.00003/cand (0.0%)	0.00085/cand (2.4%)	0.00019/cand (1.1%)
v1 Natural (Large)	0.00534/cand (2.7%)	0.00106/cand (1.6%)	0.00035/cand (0.9%)	0.00005/cand (0.2%)
v2 Authentic (Large)	0.00511/cand (2.5%)	0.00090/cand (1.3%)	0.00042/cand (1.0%)	0.00001/cand (0.0%)
v3 Professional (Large)	0.00518/cand (3.0%)	0.00085/cand (1.4%)	0.00030/cand (0.8%)	0.00003/cand (0.2%)
v4 Expressive (Large)	0.00356/cand (1.7%)	0.00090/cand (1.3%)	0.00071/cand (1.7%)	0.00003/cand (0.1%)
v5 Cinematic (Large)	0.00289/cand (1.4%)	0.00092/cand (1.4%)	0.00051/cand (1.3%)	0.00004/cand (0.2%)
v6 Natural (Small)	0.00594/cand (3.8%)	0.00147/cand (2.7%)	0.00056/cand (1.7%)	0.00021/cand (1.2%)
v7 Authentic (Small)	0.00486/cand (3.2%)	0.00190/cand (3.7%)	0.00019/cand (0.6%)	0.00022/cand (1.4%)
v8 Professional (Small)	0.00285/cand (2.3%)	0.00054/cand (1.3%)	0.00064/cand (2.5%)	0.00014/cand (1.0%)
v9 Natural−Robotic (Large)	0.00735/cand (2.1%)	0.00122/cand (1.0%)	0.00054/cand (0.7%)	0.00006/cand (0.2%)
v10 Authentic−Cheap (Large)	0.00774/cand (2.1%)	0.00158/cand (1.3%)	0.00060/cand (0.8%)	0.00004/cand (0.1%)
v11 Professional−Distorted (Large)	0.00857/cand (2.5%)	0.00176/cand (1.5%)	0.00045/cand (0.6%)	0.00005/cand (0.1%)
v12 Expressive−Flat (Large)	0.00661/cand (1.9%)	0.00139/cand (1.2%)	0.00104/cand (1.4%)	0.00010/cand (0.3%)
v13 FullPos−FullNeg (Large)	0.00733/cand (2.1%)	0.00172/cand (1.4%)	0.00034/cand (0.5%)	0.00005/cand (0.1%)
v14 Warm−Robotic (Large)	0.00775/cand (2.3%)	0.00131/cand (1.2%)	0.00042/cand (0.6%)	0.00015/cand (0.4%)
v15 Natural−Robotic (Small)	0.00851/cand (2.3%)	0.00256/cand (2.1%)	0.00023/cand (0.3%)	0.00032/cand (0.8%)
v16 Authentic−Cheap (Small)	0.00897/cand (2.4%)	0.00375/cand (3.0%)	0.00027/cand (0.3%)	0.00049/cand (1.3%)
v17 Professional−Distorted (Small)	0.00655/cand (1.9%)	0.00154/cand (1.3%)	0.00065/cand (0.9%)	0.00001/cand (0.0%)
v18 Expressive−Flat (Small)	0.00998/cand (3.2%)	0.00136/cand (1.3%)	0.00160/cand (2.4%)	0.00034/cand (1.0%)
v19 FullPos−FullNeg (Small)	0.00748/cand (2.3%)	0.00190/cand (1.7%)	0.00066/cand (1.0%)	0.00005/cand (0.1%)
v20 Warm−Robotic (Small)	0.00863/cand (2.5%)	0.00256/cand (2.2%)	0.00089/cand (1.2%)	0.00012/cand (0.3%)
v21 Sanitized Prompt (Large)	0.00441/cand (1.9%)	0.00132/cand (1.7%)	0.00032/cand (0.7%)	0.00015/cand (0.6%)
v22 Sanitized Prompt (Small)	0.00263/cand (1.6%)	-0.00016/cand (-0.3%)	0.00094/cand (2.8%)	0.00021/cand (1.2%)
v23 Sanitized−Uncanny (Large)	0.00611/cand (1.6%)	0.00225/cand (1.7%)	0.00040/cand (0.5%)	0.00023/cand (0.6%)
v24 Sanitized−Uncanny (Small)	0.00864/cand (2.4%)	0.00151/cand (1.2%)	0.00180/cand (2.4%)	0.00013/cand (0.3%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking Method	Without Suffix (N=10)			With Suffix (N=10)			Delta
	Mean	Best	Median	Mean	Best	Median	Δ Mean	Δ Best
Standard: (1−WER) × Content Enjoyment	3.6567	4.0145	3.6765	3.6493	4.0168	3.6526	-0.0074	+0.0023
VoiceCLAP-Large × Quality Text	0.8567	0.9328	0.8629	0.7588	0.8205	0.7635	-0.0980	-0.1123
VoiceCLAP-Small × Quality Text	0.6747	0.7832	0.6839	0.7588	0.8205	0.7635	+0.0840	+0.0374
VoiceCLAP-Large × Prompt Match	1.3057	1.4302	1.3155	0.7588	0.8205	0.7635	-0.5469	-0.6096
VoiceCLAP-Small × Prompt Match	0.7621	0.8696	0.7591	0.7588	0.8205	0.7635	-0.0033	-0.0490
v1 Natural (Large)	0.9153	1.0021	0.9227	0.7588	0.8205	0.7635	-0.1565	-0.1816
v2 Authentic (Large)	0.9506	1.0405	0.9580	0.7588	0.8205	0.7635	-0.1918	-0.2199
v3 Professional (Large)	0.8064	0.8780	0.8119	0.7588	0.8205	0.7635	-0.0477	-0.0575
v4 Expressive (Large)	0.9475	1.0408	0.9538	0.7588	0.8205	0.7635	-0.1887	-0.2203
v5 Cinematic (Large)	0.9211	1.0098	0.9252	0.7588	0.8205	0.7635	-0.1623	-0.1893
v6 Natural (Small)	0.7235	0.8275	0.7280	0.7588	0.8205	0.7635	+0.0352	-0.0069
v7 Authentic (Small)	0.6886	0.7783	0.6950	0.7588	0.8205	0.7635	+0.0701	+0.0422
v8 Professional (Small)	0.5635	0.6468	0.5711	0.7588	0.8205	0.7635	+0.1952	+0.1737
v9 Natural−Robotic (Large)	1.6489	1.8054	1.6572	1.5175	1.6410	1.5270	-0.1314	-0.1644
v10 Authentic−Cheap (Large)	1.7297	1.8894	1.7401	1.5175	1.6410	1.5270	-0.2122	-0.2484
v11 Professional−Distorted (Large)	1.6383	1.7835	1.6466	1.5175	1.6410	1.5270	-0.1208	-0.1425
v12 Expressive−Flat (Large)	1.6351	1.7935	1.6423	1.5175	1.6410	1.5270	-0.1175	-0.1525
v13 FullPos−FullNeg (Large)	1.6536	1.8030	1.6631	1.5175	1.6410	1.5270	-0.1361	-0.1620
v14 Warm−Robotic (Large)	1.5721	1.7130	1.5773	1.5175	1.6410	1.5270	-0.0546	-0.0720
v15 Natural−Robotic (Small)	1.7134	1.8816	1.7227	1.5175	1.6410	1.5270	-0.1959	-0.2406
v16 Authentic−Cheap (Small)	1.7184	1.9078	1.7342	1.5175	1.6410	1.5270	-0.2009	-0.2668
v17 Professional−Distorted (Small)	1.5827	1.7431	1.5878	1.5175	1.6410	1.5270	-0.0652	-0.1021
v18 Expressive−Flat (Small)	1.4598	1.6350	1.4590	1.5175	1.6410	1.5270	+0.0577	+0.0060
v19 FullPos−FullNeg (Small)	1.5350	1.6829	1.5439	1.5175	1.6410	1.5270	-0.0175	-0.0419
v20 Warm−Robotic (Small)	1.5976	1.7702	1.5925	1.5175	1.6410	1.5270	-0.0801	-0.1292
v21 Sanitized Prompt (Large)	1.0789	1.1856	1.0866	0.7588	0.8205	0.7635	-0.3201	-0.3651
v22 Sanitized Prompt (Small)	0.7356	0.8370	0.7342	0.7588	0.8205	0.7635	+0.0231	-0.0165
v23 Sanitized−Uncanny (Large)	1.8337	2.0140	1.8456	1.5175	1.6410	1.5270	-0.3162	-0.3730
v24 Sanitized−Uncanny (Small)	1.6489	1.8444	1.6493	1.5175	1.6410	1.5270	-0.1314	-0.2034

Per-Prompt Ablation: Standard Reward (N=10)

#	Lang	No Suffix Mean	No Suffix Best	With Suffix Mean	With Suffix Best	Δ Mean	Δ Best
0	English	4.4127	4.6337	4.3077	4.8373	-0.1050	+0.2036
1	French	4.7646	4.9466	4.7331	4.9823	-0.0316	+0.0357
2	English	1.1174	1.2710	1.1804	1.2948	+0.0630	+0.0238
3	German	4.7972	4.8971	4.7827	4.8982	-0.0145	+0.0011
4	French	4.5382	4.7537	4.5600	4.8114	+0.0218	+0.0577
5	French	3.5284	4.1725	3.7767	4.4835	+0.2483	+0.3110
6	English	3.0369	3.7504	3.1401	3.6910	+0.1032	-0.0594
7	German	2.2224	2.8868	2.1253	2.5059	-0.0972	-0.3809
8	Spanish	4.7732	5.0789	4.7247	4.8181	-0.0485	-0.2608
9	French	3.3758	3.7541	3.1628	3.8451	-0.2130	+0.0910

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	3.9211	4.0197	4.0497	4.1281	4.1523
Std Dev	1.2129	1.2510	1.2175	1.2375	1.2120
Avg Mean	3.6636	3.6884	3.6003	3.6224	3.6181

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	4.7776	4.7938	4.8550	4.8415	4.8550
1	French	4.8704	5.0228	5.0776	5.0776	5.0776
2	English	1.2889	1.3107	1.3004	1.2832	1.3107
3	German	4.8286	4.9408	4.9466	4.9466	4.9687
4	French	4.6758	4.7537	4.7341	4.7982	4.7982
5	French	4.0560	4.3700	4.3700	4.6713	4.6713
6	English	3.8028	3.7903	3.7252	3.9250	3.9250
7	German	2.4948	2.4318	2.7525	2.7623	2.9413
8	Spanish	4.9337	5.0161	4.9645	5.0789	5.0789
9	French	3.4829	3.7670	3.7712	3.8962	3.8962

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9132	0.9378	0.9521	0.9581	0.9605
Std Dev	0.2877	0.2927	0.2786	0.2761	0.2770
Avg Mean	0.8536	0.8576	0.8394	0.8467	0.8453

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1582	1.2039	1.2094	1.2039	1.2094
1	French	1.1698	1.1680	1.1825	1.1825	1.1825
2	English	0.3376	0.3430	0.3653	0.3653	0.3653
3	German	1.1379	1.1392	1.1497	1.1544	1.1544
4	French	1.2514	1.2718	1.2643	1.2616	1.2718
5	French	0.8476	0.9030	0.9030	0.9202	0.9202
6	English	0.8922	0.9158	0.9198	0.9198	0.9198
7	German	0.5721	0.5668	0.6462	0.6604	0.6685
8	Spanish	0.8912	0.9030	0.9193	0.9251	0.9251
9	French	0.8738	0.9634	0.9613	0.9876	0.9876

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7602	0.7738	0.7914	0.8037	0.8143
Std Dev	0.2858	0.2947	0.2852	0.2913	0.2882
Avg Mean	0.6834	0.6801	0.6552	0.6653	0.6646

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.7934	0.8073	0.8373	0.8412	0.8569
1	French	1.0337	1.0299	1.0184	1.0643	1.0643
2	English	0.2656	0.2653	0.2673	0.2660	0.2673
3	German	1.0827	1.0635	1.0875	1.1005	1.1113
4	French	1.1299	1.1572	1.1713	1.1713	1.1713
5	French	0.5162	0.5541	0.6197	0.6036	0.6197
6	English	0.7566	0.7612	0.7731	0.7731	0.8069
7	German	0.4837	0.4344	0.5363	0.5298	0.5582
8	Spanish	0.6328	0.6664	0.6059	0.6664	0.6664
9	French	0.9072	0.9983	0.9973	1.0211	1.0211

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.3985	1.4249	1.4389	1.4555	1.4595
Std Dev	0.4499	0.4530	0.4418	0.4390	0.4376
Avg Mean	1.3025	1.3116	1.2775	1.2882	1.2877

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.7348	1.7763	1.7653	1.7775	1.7775
1	French	1.8340	1.8494	1.8364	1.8560	1.8560
2	English	0.4611	0.4625	0.4611	0.4655	0.4659
3	German	1.7875	1.8278	1.8309	1.8294	1.8347
4	French	1.5893	1.5869	1.6135	1.6175	1.6175
5	French	1.5492	1.6133	1.6329	1.6618	1.6618
6	English	1.3255	1.2953	1.3021	1.3255	1.3512
7	German	0.8214	0.8595	0.9457	0.9935	1.0018
8	Spanish	1.6502	1.6439	1.6496	1.6631	1.6631
9	French	1.2317	1.3342	1.3515	1.3655	1.3655

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8512	0.8702	0.8706	0.8918	0.9013
Std Dev	0.2877	0.2879	0.2868	0.2893	0.2847
Avg Mean	0.7663	0.7682	0.7461	0.7538	0.7526

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.3195	1.3274	1.3233	1.3367	1.3367
1	French	0.8898	0.9041	0.9000	0.9000	0.9085
2	English	0.3192	0.2841	0.3029	0.3023	0.3192
3	German	1.1300	1.1490	1.1623	1.1907	1.1907
4	French	0.8509	0.8983	0.8778	0.9097	0.9097
5	French	0.8920	1.0036	1.0036	0.9609	1.0036
6	English	1.0857	0.9828	1.0145	1.0892	1.0892
7	German	0.5828	0.6266	0.6308	0.6702	0.6974
8	Spanish	0.7441	0.7398	0.7505	0.7717	0.7717
9	French	0.6980	0.7867	0.7406	0.7867	0.7867

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9764	1.0031	1.0191	1.0278	1.0301
Std Dev	0.2900	0.2936	0.2862	0.2853	0.2837
Avg Mean	0.9106	0.9142	0.8959	0.9024	0.9014

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.2753	1.3070	1.3020	1.3155	1.3155
1	French	1.2256	1.2280	1.2585	1.2585	1.2585
2	English	0.3817	0.3846	0.3986	0.3986	0.3986
3	German	1.1115	1.1171	1.1289	1.1289	1.1289
4	French	1.3096	1.3124	1.3213	1.3162	1.3213
5	French	0.9475	1.0200	1.0200	1.0337	1.0337
6	English	0.9296	0.9727	0.9906	0.9906	0.9906
7	German	0.6304	0.6356	0.6936	0.7113	0.7295
8	Spanish	0.9762	0.9799	0.9951	1.0209	1.0209
9	French	0.9769	1.0742	1.0820	1.1036	1.1036

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.0138	1.0394	1.0529	1.0635	1.0638
Std Dev	0.2793	0.2889	0.2741	0.2746	0.2738
Avg Mean	0.9458	0.9512	0.9299	0.9381	0.9367

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1419	1.1758	1.1785	1.1811	1.1811
1	French	1.2850	1.2929	1.2979	1.2979	1.2979
2	English	0.4014	0.3923	0.4102	0.4096	0.4129
3	German	1.1828	1.2004	1.1983	1.2011	1.2011
4	French	1.2851	1.3088	1.3066	1.3084	1.3088
5	French	1.0603	1.1132	1.1132	1.1470	1.1470
6	English	0.9537	0.9756	0.9730	0.9902	0.9902
7	German	0.6878	0.6898	0.7822	0.7958	0.7958
8	Spanish	1.1612	1.1747	1.1853	1.1952	1.1952
9	French	0.9792	1.0704	1.0842	1.1085	1.1085

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8590	0.8849	0.8977	0.9053	0.9067
Std Dev	0.2640	0.2688	0.2526	0.2513	0.2522
Avg Mean	0.8024	0.8067	0.7904	0.7970	0.7955

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.0653	1.1191	1.1217	1.1217	1.1217
1	French	1.0959	1.1022	1.1109	1.1109	1.1109
2	English	0.3224	0.3292	0.3578	0.3578	0.3578
3	German	1.0931	1.0896	1.0977	1.1065	1.1065
4	French	1.1510	1.1686	1.1652	1.1605	1.1686
5	French	0.7823	0.8424	0.8424	0.8601	0.8601
6	English	0.8638	0.8963	0.8904	0.8904	0.8963
7	German	0.5541	0.5473	0.6301	0.6391	0.6391
8	Spanish	0.8439	0.8552	0.8648	0.8825	0.8825
9	French	0.8183	0.8992	0.8958	0.9231	0.9231

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.0251	1.0429	1.0563	1.0740	1.0753
Std Dev	0.2986	0.2884	0.2893	0.2944	0.2924
Avg Mean	0.9443	0.9516	0.9309	0.9360	0.9348

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.2831	1.2912	1.2918	1.2918	1.2947
1	French	1.2773	1.2643	1.3222	1.3222	1.3222
2	English	0.3935	0.4168	0.4045	0.4091	0.4168
3	German	1.0329	1.0660	1.0660	1.0745	1.0749
4	French	1.2343	1.2362	1.2646	1.2948	1.2948
5	French	1.1254	1.1557	1.1559	1.2255	1.2255
6	English	1.1776	1.1656	1.1440	1.1776	1.1776
7	German	0.6135	0.6409	0.7016	0.7079	0.7101
8	Spanish	1.1684	1.1759	1.1781	1.1833	1.1833
9	French	0.9445	1.0161	1.0347	1.0530	1.0530

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9964	1.0109	1.0247	1.0376	1.0394
Std Dev	0.2724	0.2761	0.2649	0.2702	0.2691
Avg Mean	0.9184	0.9236	0.9017	0.9087	0.9075

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1097	1.1111	1.1278	1.1278	1.1313
1	French	1.2325	1.2483	1.2467	1.2661	1.2661
2	English	0.3990	0.3969	0.3965	0.4087	0.4087
3	German	1.0830	1.1234	1.1234	1.1196	1.1234
4	French	1.2276	1.2386	1.2463	1.2566	1.2566
5	French	1.1481	1.1758	1.1852	1.2318	1.2318
6	English	1.1073	1.0681	1.0774	1.1073	1.1073
7	German	0.6179	0.6321	0.7295	0.7186	0.7295
8	Spanish	1.0503	1.0536	1.0467	1.0595	1.0595
9	French	0.9890	1.0608	1.0677	1.0798	1.0798

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7862	0.8159	0.8379	0.8518	0.8621
Std Dev	0.2482	0.2556	0.2474	0.2531	0.2501
Avg Mean	0.7243	0.7259	0.7031	0.7109	0.7100

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.6108	0.6347	0.6967	0.6987	0.7277
1	French	1.0956	1.0962	1.1180	1.1404	1.1404
2	English	0.3171	0.3042	0.3166	0.3222	0.3222
3	German	1.0255	1.0044	1.0577	1.0503	1.0577
4	French	1.0307	1.0307	1.0817	1.0824	1.0824
5	French	0.6157	0.7201	0.7186	0.7417	0.7417
6	English	0.7917	0.8660	0.8145	0.8410	0.8845
7	German	0.5967	0.5642	0.6810	0.6609	0.6839
8	Spanish	0.8426	0.8955	0.8782	0.9215	0.9215
9	French	0.9360	1.0432	1.0160	1.0593	1.0593

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7493	0.7736	0.8021	0.8069	0.8180
Std Dev	0.2233	0.2350	0.2309	0.2282	0.2268
Avg Mean	0.6895	0.6906	0.6677	0.6769	0.6759

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.6287	0.6399	0.7065	0.7135	0.7255
1	French	1.0487	1.0506	1.0779	1.0779	1.0779
2	English	0.3194	0.3202	0.3207	0.3268	0.3268
3	German	0.9714	0.9818	1.0153	0.9954	1.0153
4	French	0.9504	0.9694	0.9932	1.0018	1.0018
5	French	0.6871	0.7827	0.7645	0.8041	0.8041
6	English	0.6475	0.7115	0.7146	0.7146	0.7555
7	German	0.5543	0.4889	0.5921	0.5906	0.6122
8	Spanish	0.8518	0.8811	0.8979	0.9061	0.9061
9	French	0.8337	0.9098	0.9381	0.9381	0.9549

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.6313	0.6456	0.6537	0.6698	0.6767
Std Dev	0.2373	0.2383	0.2276	0.2291	0.2330
Avg Mean	0.5696	0.5666	0.5493	0.5564	0.5563

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.8213	0.8085	0.8463	0.8506	0.8596
1	French	0.8086	0.8257	0.8056	0.8390	0.8402
2	English	0.2282	0.2307	0.2376	0.2381	0.2394
3	German	0.8733	0.8627	0.8786	0.8831	0.8922
4	French	0.9374	0.9644	0.9253	0.9374	0.9644
5	French	0.3853	0.4351	0.4726	0.4569	0.4726
6	English	0.6150	0.6312	0.6484	0.6590	0.6590
7	German	0.4163	0.4004	0.4614	0.5038	0.5038
8	Spanish	0.5215	0.5338	0.4931	0.5338	0.5338
9	French	0.7063	0.7634	0.7680	0.7963	0.8020

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7559	1.7926	1.8109	1.8245	1.8275
Std Dev	0.5045	0.5081	0.4897	0.4920	0.4902
Avg Mean	1.6442	1.6539	1.6134	1.6264	1.6253

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1736	2.1736	2.1747	2.1848	2.1848
1	French	2.1287	2.1429	2.1472	2.1500	2.1505
2	English	0.6030	0.6056	0.6116	0.6123	0.6123
3	German	2.0797	2.0888	2.0877	2.0945	2.0945
4	French	2.2030	2.2117	2.2084	2.2203	2.2203
5	French	1.8193	1.9156	1.9156	1.9413	1.9413
6	English	1.6918	1.7193	1.7391	1.7391	1.7527
7	German	1.2064	1.2208	1.3700	1.3945	1.4109
8	Spanish	1.9343	1.9471	1.9591	1.9998	1.9998
9	French	1.7191	1.9010	1.8959	1.9081	1.9081

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.8424	1.8812	1.9048	1.9199	1.9217
Std Dev	0.5175	0.5344	0.5107	0.5145	0.5129
Avg Mean	1.7253	1.7352	1.6914	1.7073	1.7052

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.9587	1.9875	1.9724	1.9875	1.9875
1	French	2.3190	2.3273	2.3388	2.3388	2.3388
2	English	0.6691	0.6567	0.6737	0.6737	0.6778
3	German	2.1445	2.1866	2.1777	2.1866	2.1866
4	French	2.3130	2.3429	2.3539	2.3539	2.3539
5	French	2.0317	2.1040	2.1151	2.1764	2.1764
6	English	1.7215	1.7313	1.7306	1.7548	1.7686
7	German	1.2738	1.2732	1.4685	1.4767	1.4767
8	Spanish	2.1688	2.1897	2.1943	2.2145	2.2145
9	French	1.8242	2.0123	2.0234	2.0364	2.0364

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7442	1.7871	1.8136	1.8248	1.8271
Std Dev	0.5054	0.5156	0.4874	0.4867	0.4879
Avg Mean	1.6298	1.6395	1.6025	1.6164	1.6138

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1109	2.1886	2.2134	2.2134	2.2134
1	French	2.2090	2.2086	2.2248	2.2248	2.2248
2	English	0.6577	0.6694	0.6978	0.6978	0.6978
3	German	2.0262	2.0254	2.0450	2.0467	2.0467
4	French	2.2768	2.3032	2.2888	2.2982	2.3085
5	French	1.7806	1.8894	1.8894	1.9319	1.9319
6	English	1.7203	1.7584	1.7515	1.7588	1.7588
7	German	1.1253	1.1100	1.2946	1.3197	1.3197
8	Spanish	1.8646	1.8799	1.8694	1.8951	1.8951
9	French	1.6710	1.8382	1.8610	1.8617	1.8740

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7568	1.7898	1.8107	1.8367	1.8419
Std Dev	0.5213	0.5045	0.5028	0.5073	0.5068
Avg Mean	1.6297	1.6452	1.6049	1.6148	1.6134

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.2028	2.2028	2.1993	2.2028	2.2028
1	French	2.1866	2.1627	2.2576	2.2576	2.2576
2	English	0.5835	0.6100	0.6079	0.6168	0.6168
3	German	1.8672	1.9043	1.9078	1.9181	1.9483
4	French	2.0900	2.1050	2.1191	2.1535	2.1535
5	French	1.9551	2.0159	2.0159	2.1084	2.1084
6	English	1.9193	1.9044	1.8824	1.9230	1.9333
7	German	1.1260	1.1998	1.2958	1.3208	1.3324
8	Spanish	2.0324	2.0405	2.0544	2.0812	2.0812
9	French	1.6048	1.7527	1.7668	1.7846	1.7846

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7593	1.7959	1.8217	1.8302	1.8325
Std Dev	0.5255	0.5299	0.5103	0.5108	0.5090
Avg Mean	1.6470	1.6573	1.6184	1.6314	1.6297

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1385	2.1740	2.1722	2.1798	2.1798
1	French	2.2920	2.2888	2.3142	2.3142	2.3142
2	English	0.6112	0.6215	0.6366	0.6366	0.6366
3	German	2.0361	2.0500	2.0607	2.0607	2.0607
4	French	2.2520	2.2593	2.2678	2.2660	2.2678
5	French	1.8327	1.9253	1.9253	1.9678	1.9678
6	English	1.7101	1.7303	1.7422	1.7427	1.7427
7	German	1.1290	1.1300	1.2866	1.2994	1.3176
8	Spanish	1.8750	1.8928	1.9125	1.9359	1.9359
9	French	1.7162	1.8875	1.8988	1.8988	1.9018

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6689	1.7077	1.7273	1.7379	1.7454
Std Dev	0.5072	0.5098	0.4944	0.4973	0.4949
Avg Mean	1.5668	1.5781	1.5391	1.5523	1.5510

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0806	2.1079	2.1412	2.1218	2.1412
1	French	2.0684	2.0769	2.0838	2.0832	2.0976
2	English	0.5469	0.5528	0.5596	0.5528	0.5596
3	German	2.0888	2.1274	2.1112	2.1265	2.1274
4	French	2.1134	2.1134	2.1140	2.1282	2.1282
5	French	1.6897	1.7711	1.7711	1.8079	1.8079
6	English	1.6168	1.6501	1.6694	1.6694	1.6694
7	German	1.1035	1.1190	1.2542	1.2641	1.2978
8	Spanish	1.8116	1.8217	1.8366	1.8682	1.8682
9	French	1.5695	1.7362	1.7319	1.7569	1.7569

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.8332	1.8758	1.9142	1.9200	1.9358
Std Dev	0.5389	0.5460	0.5315	0.5315	0.5339
Avg Mean	1.7050	1.7186	1.6716	1.6867	1.6849

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.8996	1.9782	2.0171	2.0119	2.0586
1	French	2.4417	2.4035	2.4509	2.4509	2.4509
2	English	0.6400	0.6458	0.6537	0.6537	0.6537
3	German	2.2766	2.2829	2.2951	2.3095	2.3099
4	French	2.3222	2.3386	2.4143	2.3708	2.4143
5	French	1.8268	1.9063	1.9629	2.0193	2.0193
6	English	1.8211	1.8898	1.8184	1.8459	1.8898
7	German	1.2413	1.2137	1.4525	1.4290	1.4525
8	Spanish	2.0023	2.0249	1.9953	2.0268	2.0268
9	French	1.8603	2.0738	2.0820	2.0820	2.0820

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.8533	1.8982	1.9543	1.9610	1.9857
Std Dev	0.5464	0.5686	0.5648	0.5588	0.5600
Avg Mean	1.7071	1.7279	1.6767	1.6943	1.6907

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.7692	1.7692	1.8370	1.8730	1.9137
1	French	2.5375	2.4476	2.5375	2.5488	2.5488
2	English	0.6677	0.6403	0.6534	0.6537	0.6677
3	German	2.3305	2.3969	2.4266	2.4390	2.4390
4	French	2.1874	2.2907	2.3943	2.3109	2.3943
5	French	2.0455	2.1373	2.1570	2.2059	2.2059
6	English	1.6615	1.7937	1.7499	1.7499	1.8007
7	German	1.3056	1.2638	1.4821	1.5207	1.5207
8	Spanish	2.1587	2.2565	2.2621	2.2649	2.2649
9	French	1.8695	1.9855	2.0435	2.0435	2.1015

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7020	1.7347	1.7577	1.7740	1.7744
Std Dev	0.5263	0.5350	0.5146	0.5173	0.5176
Avg Mean	1.5803	1.5863	1.5439	1.5592	1.5577

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0713	2.1246	2.1426	2.1426	2.1426
1	French	2.1997	2.1827	2.1551	2.1993	2.1998
2	English	0.5479	0.5369	0.5557	0.5586	0.5586
3	German	2.0820	2.0620	2.0820	2.0955	2.0992
4	French	2.2254	2.2799	2.2824	2.2824	2.2824
5	French	1.6914	1.7538	1.8142	1.8170	1.8170
6	English	1.6814	1.6787	1.6857	1.7444	1.7444
7	German	1.1067	1.1287	1.2695	1.2812	1.2812
8	Spanish	1.7463	1.7479	1.7576	1.7615	1.7615
9	French	1.6674	1.8516	1.8325	1.8574	1.8574

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.5764	1.6263	1.6467	1.6868	1.7039
Std Dev	0.4879	0.4754	0.4767	0.4709	0.4733
Avg Mean	1.4591	1.4602	1.4323	1.4394	1.4391

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0212	2.0794	2.0556	2.0456	2.0794
1	French	1.8202	1.7948	1.9501	1.9501	1.9501
2	English	0.5053	0.5463	0.5266	0.5463	0.5466
3	German	1.7204	1.8126	1.8176	1.8234	1.8234
4	French	1.8198	1.8198	1.9155	1.9016	1.9155
5	French	1.6916	1.7813	1.7856	1.9195	1.9195
6	English	1.6406	1.6957	1.6320	1.7379	1.7416
7	German	0.9830	1.0688	1.1472	1.2472	1.2472
8	Spanish	2.1011	2.0943	2.0439	2.1034	2.1034
9	French	1.4611	1.5701	1.5930	1.5930	1.7122

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6394	1.6768	1.7053	1.7219	1.7244
Std Dev	0.4773	0.4861	0.4686	0.4662	0.4669
Avg Mean	1.5296	1.5378	1.4985	1.5121	1.5109

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.8933	1.9317	1.9910	1.9910	1.9910
1	French	2.1025	2.1021	2.0847	2.1204	2.1204
2	English	0.5753	0.5721	0.5782	0.5926	0.5926
3	German	2.0274	1.9954	2.0274	2.0190	2.0274
4	French	2.0485	2.1151	2.0990	2.1112	2.1151
5	French	1.6047	1.6975	1.7564	1.7714	1.7714
6	English	1.5931	1.6588	1.6046	1.6462	1.6588
7	German	1.1161	1.0997	1.2818	1.3035	1.3035
8	Spanish	1.8430	1.8430	1.8656	1.8656	1.8656
9	French	1.5902	1.7528	1.7639	1.7977	1.7977

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7189	1.7620	1.8005	1.8227	1.8289
Std Dev	0.5457	0.5518	0.5512	0.5440	0.5498
Avg Mean	1.5933	1.6016	1.5596	1.5732	1.5714

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0912	2.1808	2.2270	2.2070	2.2270
1	French	2.2332	2.1770	2.2332	2.2332	2.2332
2	English	0.5420	0.5532	0.5588	0.5659	0.5659
3	German	2.1429	2.1766	2.1837	2.2188	2.2188
4	French	2.2308	2.2308	2.3380	2.2960	2.3380
5	French	1.7326	1.8033	1.8033	1.8954	1.8954
6	English	1.6905	1.7608	1.7135	1.7750	1.7750
7	German	1.0509	1.0436	1.2098	1.2258	1.2258
8	Spanish	1.7668	1.7853	1.7630	1.8356	1.8356
9	French	1.7081	1.9089	1.9744	1.9744	1.9744

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.1563	1.1783	1.1981	1.2061	1.2133
Std Dev	0.3575	0.3593	0.3532	0.3492	0.3466
Avg Mean	1.0756	1.0813	1.0557	1.0632	1.0623

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.5672	1.5740	1.5768	1.5808	1.5879
1	French	1.4746	1.4709	1.5038	1.5038	1.5038
2	English	0.4306	0.4283	0.4310	0.4330	0.4458
3	German	1.3502	1.3902	1.3987	1.3817	1.3987
4	French	1.2544	1.2620	1.2808	1.2993	1.2993
5	French	1.3296	1.3911	1.4057	1.4213	1.4213
6	English	1.2152	1.2188	1.2114	1.2237	1.2482
7	German	0.6519	0.6742	0.7445	0.7784	0.7897
8	Spanish	1.2399	1.2323	1.2370	1.2475	1.2475
9	French	1.0491	1.1412	1.1912	1.1912	1.1912

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8343	0.8474	0.8450	0.8684	0.8791
Std Dev	0.2867	0.2840	0.2880	0.2861	0.2842
Avg Mean	0.7475	0.7419	0.7211	0.7283	0.7283

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.3355	1.3466	1.3529	1.3483	1.3529
1	French	0.8675	0.8810	0.8792	0.8801	0.8852
2	English	0.3241	0.2935	0.3057	0.3064	0.3241
3	German	1.0421	1.0477	1.0710	1.0743	1.1084
4	French	0.7756	0.7756	0.7853	0.8179	0.8179
5	French	0.8878	0.9857	0.9857	0.9709	0.9857
6	English	1.1032	1.0235	1.0265	1.1181	1.1181
7	German	0.5729	0.6159	0.6138	0.6556	0.6856
8	Spanish	0.7867	0.7790	0.7439	0.7867	0.7878
9	French	0.6474	0.7256	0.6859	0.7256	0.7256

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.9570	1.9875	2.0213	2.0312	2.0425
Std Dev	0.5924	0.5927	0.5796	0.5772	0.5721
Avg Mean	1.8276	1.8389	1.7934	1.8075	1.8061

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.5489	2.5576	2.5751	2.5751	2.5751
1	French	2.4663	2.4520	2.4953	2.4953	2.4953
2	English	0.6718	0.6710	0.6773	0.6779	0.6916
3	German	2.3168	2.3578	2.3692	2.3506	2.3692
4	French	2.2387	2.2399	2.2572	2.2780	2.2780
5	French	2.1723	2.2625	2.2870	2.3178	2.3178
6	English	1.9706	1.9576	1.9602	1.9802	2.0415
7	German	1.1984	1.2290	1.3814	1.4138	1.4337
8	Spanish	2.1705	2.1750	2.1815	2.1948	2.1948
9	French	1.8154	1.9727	2.0283	2.0283	2.0283

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7907	1.8339	1.8565	1.9014	1.9077
Std Dev	0.5481	0.5641	0.5515	0.5483	0.5450
Avg Mean	1.6408	1.6505	1.6121	1.6259	1.6234

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.5327	2.5327	2.5217	2.5356	2.5356
1	French	2.1469	2.1527	2.1980	2.1980	2.1980
2	English	0.6129	0.5430	0.5813	0.5936	0.6129
3	German	2.2187	2.2424	2.2300	2.3312	2.3312
4	French	1.9258	2.0512	2.1278	2.0837	2.1278
5	French	1.9517	2.1434	2.1434	2.1468	2.1468
6	English	1.9735	1.9077	1.8959	2.0760	2.0760
7	German	1.2437	1.3207	1.3930	1.5221	1.5221
8	Spanish	1.7914	1.7675	1.8137	1.8490	1.8490
9	French	1.5095	1.6777	1.6600	1.6777	1.6777

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100

DramaBox Prompt

A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100

DramaBox Prompt

High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.

[Prompts 0-1] · Prompts 2-3 · Prompts 4-5 · Prompts 6-7 · Prompts 8-9