Best-of-N Analysis: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Main Grid RE-USE CC v1 CC v2 Acting Challenge

[Prompts 0-1] · Prompts 2-3 · Prompts 4-5 · Prompts 6-7 · Prompts 8-9

Ranking Method:

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#	Key	Score Formula	Text Prompt(s)
0	standard	Content Enjoyment	—
1	clap_lq	cos(audio, quality_text)	"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2	clap_sq	cos(audio, quality_text)	"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3	clap_lp	cos(audio, prompt)	Original DramaBox prompt
4	clap_sp	cos(audio, prompt)	Original DramaBox prompt
5	v1_nat_L	cos(audio, nat)	"natural, spontaneous, lifelike speech with genuine emotion"
6	v2_auth_L	cos(audio, auth)	"authentic, emotionally truthful, deeply felt voice performance"
7	v3_pro_L	cos(audio, pro)	"professional studio recording, crystal clear high-fidelity audio"
8	v4_expr_L	cos(audio, expr)	"expressive, dynamic voice acting with rich emotional range"
9	v5_cine_L	cos(audio, cine)	"immersive cinematic narration, compelling storytelling"
10	v6_nat_S	cos(audio, nat)	"natural, spontaneous, lifelike speech with genuine emotion"
11	v7_auth_S	cos(audio, auth)	"authentic, emotionally truthful, deeply felt voice performance"
12	v8_pro_S	cos(audio, pro)	"professional studio recording, crystal clear high-fidelity audio"
13	v9_nr_L	cos(audio, nat) − cos(audio, rob)	+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14	v10_ac_L	cos(audio, auth) − cos(audio, cheap)	+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15	v11_pd_L	cos(audio, pro) − cos(audio, dist)	+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16	v12_ef_L	cos(audio, expr) − cos(audio, flat)	+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17	v13_ff_L	cos(audio, full_pos) − cos(audio, full_neg)	+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18	v14_wr_L	cos(audio, warm) − cos(audio, rob)	+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19	v15_nr_S	cos(audio, nat) − cos(audio, rob)	+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20	v16_ac_S	cos(audio, auth) − cos(audio, cheap)	+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21	v17_pd_S	cos(audio, pro) − cos(audio, dist)	+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22	v18_ef_S	cos(audio, expr) − cos(audio, flat)	+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23	v19_ff_S	cos(audio, full_pos) − cos(audio, full_neg)	+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24	v20_wr_S	cos(audio, warm) − cos(audio, rob)	+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25	v21_san_L	cos(audio, sanitized_prompt)	Quoted speech removed (Large)
26	v22_san_S	cos(audio, sanitized_prompt)	Quoted speech removed (Small)
27	v23_snr_L	cos(audio, sanitized) − cos(audio, neg_san)	Sanitized / − "robotic, distorted, uncanny" (Large)
28	v24_snr_S	cos(audio, sanitized) − cos(audio, neg_san)	Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method	N=5	N=10	N=25	N=50	N=100	Gain N=5→100	Knee Point
Standard: (1−WER) × Content Enjoyment	3.9546	4.0007	4.0772	4.1326	4.1640	+0.2093	N=50
VoiceCLAP-Large × Quality Text	0.9101	0.9363	0.9531	0.9551	0.9596	+0.0495	N=50
VoiceCLAP-Small × Quality Text	0.7434	0.7655	0.7740	0.7969	0.7989	+0.0555	N=25
VoiceCLAP-Large × Prompt Match	1.3972	1.4204	1.4363	1.4510	1.4585	+0.0612	N=25
VoiceCLAP-Small × Prompt Match	0.8601	0.8677	0.8774	0.8891	0.9028	+0.0426	N=25
v1 Natural (Large)	0.9750	1.0031	1.0236	1.0246	1.0333	+0.0583	N=50
v2 Authentic (Large)	1.0092	1.0361	1.0536	1.0591	1.0631	+0.0539	N=50
v3 Professional (Large)	0.8574	0.8850	0.9008	0.9019	0.9088	+0.0513	N=50
v4 Expressive (Large)	1.0185	1.0361	1.0584	1.0717	1.0756	+0.0571	N=50
v5 Cinematic (Large)	0.9942	1.0057	1.0249	1.0338	1.0391	+0.0449	N=50
v6 Natural (Small)	0.7754	0.8111	0.8265	0.8481	0.8562	+0.0807	N=100
v7 Authentic (Small)	0.7471	0.7727	0.7910	0.8014	0.8104	+0.0634	N=50
v8 Professional (Small)	0.6180	0.6390	0.6422	0.6624	0.6708	+0.0528	N=25
v9 Natural−Robotic (Large)	1.7588	1.7906	1.8169	1.8278	1.8339	+0.0751	N=25
v10 Authentic−Cheap (Large)	1.8458	1.8761	1.9049	1.9166	1.9189	+0.0731	N=50
v11 Professional−Distorted (Large)	1.7378	1.7827	1.8108	1.8153	1.8213	+0.0835	N=50
v12 Expressive−Flat (Large)	1.7552	1.7865	1.8133	1.8333	1.8406	+0.0854	N=50
v13 FullPos−FullNeg (Large)	1.7552	1.7934	1.8229	1.8280	1.8360	+0.0808	N=50
v14 Warm−Robotic (Large)	1.6719	1.7076	1.7300	1.7401	1.7497	+0.0778	N=25
v15 Natural−Robotic (Small)	1.8316	1.8742	1.9115	1.9234	1.9383	+0.1067	N=50
v16 Authentic−Cheap (Small)	1.8657	1.8969	1.9666	1.9741	1.9951	+0.1294	N=50
v17 Professional−Distorted (Small)	1.7016	1.7315	1.7590	1.7725	1.7755	+0.0739	N=50
v18 Expressive−Flat (Small)	1.5761	1.6173	1.6483	1.6891	1.7099	+0.1337	N=100
v19 FullPos−FullNeg (Small)	1.6357	1.6796	1.7039	1.7201	1.7248	+0.0892	N=25
v20 Warm−Robotic (Small)	1.7297	1.7715	1.8069	1.8281	1.8450	+0.1153	N=50
v21 Sanitized Prompt (Large)	1.1532	1.1751	1.1969	1.2031	1.2158	+0.0625	N=50
v22 Sanitized Prompt (Small)	0.8346	0.8339	0.8457	0.8624	0.8819	+0.0473	N=10
v23 Sanitized−Uncanny (Large)	1.9507	1.9826	2.0204	2.0303	2.0413	+0.0906	N=50
v24 Sanitized−Uncanny (Small)	1.8188	1.8346	1.8690	1.9133	1.9251	+0.1063	N=50

Diminishing Returns — All Methods Overlaid

Marginal Improvement per Additional Candidate

Method	N=5→10	N=10→25	N=25→50	N=50→100
Standard: (1−WER) × Content Enjoyment	0.00920/cand (1.2%)	0.00510/cand (1.9%)	0.00222/cand (1.4%)	0.00063/cand (0.8%)
VoiceCLAP-Large × Quality Text	0.00524/cand (2.9%)	0.00112/cand (1.8%)	0.00008/cand (0.2%)	0.00009/cand (0.5%)
VoiceCLAP-Small × Quality Text	0.00441/cand (3.0%)	0.00057/cand (1.1%)	0.00092/cand (3.0%)	0.00004/cand (0.2%)
VoiceCLAP-Large × Prompt Match	0.00463/cand (1.7%)	0.00106/cand (1.1%)	0.00059/cand (1.0%)	0.00015/cand (0.5%)
VoiceCLAP-Small × Prompt Match	0.00152/cand (0.9%)	0.00065/cand (1.1%)	0.00047/cand (1.3%)	0.00027/cand (1.5%)
v1 Natural (Large)	0.00561/cand (2.9%)	0.00137/cand (2.0%)	0.00004/cand (0.1%)	0.00018/cand (0.9%)
v2 Authentic (Large)	0.00538/cand (2.7%)	0.00117/cand (1.7%)	0.00022/cand (0.5%)	0.00008/cand (0.4%)
v3 Professional (Large)	0.00552/cand (3.2%)	0.00105/cand (1.8%)	0.00005/cand (0.1%)	0.00014/cand (0.8%)
v4 Expressive (Large)	0.00352/cand (1.7%)	0.00149/cand (2.2%)	0.00053/cand (1.3%)	0.00008/cand (0.4%)
v5 Cinematic (Large)	0.00229/cand (1.2%)	0.00128/cand (1.9%)	0.00036/cand (0.9%)	0.00011/cand (0.5%)
v6 Natural (Small)	0.00714/cand (4.6%)	0.00103/cand (1.9%)	0.00086/cand (2.6%)	0.00016/cand (0.9%)
v7 Authentic (Small)	0.00512/cand (3.4%)	0.00122/cand (2.4%)	0.00042/cand (1.3%)	0.00018/cand (1.1%)
v8 Professional (Small)	0.00420/cand (3.4%)	0.00022/cand (0.5%)	0.00081/cand (3.1%)	0.00017/cand (1.3%)
v9 Natural−Robotic (Large)	0.00636/cand (1.8%)	0.00175/cand (1.5%)	0.00044/cand (0.6%)	0.00012/cand (0.3%)
v10 Authentic−Cheap (Large)	0.00606/cand (1.6%)	0.00192/cand (1.5%)	0.00047/cand (0.6%)	0.00005/cand (0.1%)
v11 Professional−Distorted (Large)	0.00898/cand (2.6%)	0.00187/cand (1.6%)	0.00018/cand (0.2%)	0.00012/cand (0.3%)
v12 Expressive−Flat (Large)	0.00626/cand (1.8%)	0.00179/cand (1.5%)	0.00080/cand (1.1%)	0.00015/cand (0.4%)
v13 FullPos−FullNeg (Large)	0.00764/cand (2.2%)	0.00196/cand (1.6%)	0.00020/cand (0.3%)	0.00016/cand (0.4%)
v14 Warm−Robotic (Large)	0.00712/cand (2.1%)	0.00150/cand (1.3%)	0.00040/cand (0.6%)	0.00019/cand (0.6%)
v15 Natural−Robotic (Small)	0.00851/cand (2.3%)	0.00249/cand (2.0%)	0.00048/cand (0.6%)	0.00030/cand (0.8%)
v16 Authentic−Cheap (Small)	0.00622/cand (1.7%)	0.00465/cand (3.7%)	0.00030/cand (0.4%)	0.00042/cand (1.1%)
v17 Professional−Distorted (Small)	0.00599/cand (1.8%)	0.00183/cand (1.6%)	0.00054/cand (0.8%)	0.00006/cand (0.2%)
v18 Expressive−Flat (Small)	0.00824/cand (2.6%)	0.00207/cand (1.9%)	0.00163/cand (2.5%)	0.00042/cand (1.2%)
v19 FullPos−FullNeg (Small)	0.00879/cand (2.7%)	0.00162/cand (1.4%)	0.00065/cand (1.0%)	0.00009/cand (0.3%)
v20 Warm−Robotic (Small)	0.00836/cand (2.4%)	0.00236/cand (2.0%)	0.00085/cand (1.2%)	0.00034/cand (0.9%)
v21 Sanitized Prompt (Large)	0.00438/cand (1.9%)	0.00145/cand (1.9%)	0.00025/cand (0.5%)	0.00025/cand (1.1%)
v22 Sanitized Prompt (Small)	-0.00014/cand (-0.1%)	0.00079/cand (1.4%)	0.00067/cand (2.0%)	0.00039/cand (2.3%)
v23 Sanitized−Uncanny (Large)	0.00639/cand (1.6%)	0.00252/cand (1.9%)	0.00040/cand (0.5%)	0.00022/cand (0.5%)
v24 Sanitized−Uncanny (Small)	0.00316/cand (0.9%)	0.00230/cand (1.9%)	0.00177/cand (2.4%)	0.00023/cand (0.6%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking Method	Without Suffix (N=10)			With Suffix (N=10)			Delta
	Mean	Best	Median	Mean	Best	Median	Δ Mean	Δ Best
Standard: (1−WER) × Content Enjoyment	3.7109	4.0097	3.7324	3.6732	4.0604	3.7128	-0.0377	+0.0507
VoiceCLAP-Large × Quality Text	0.8616	0.9271	0.8699	0.7575	0.8228	0.7704	-0.1040	-0.1043
VoiceCLAP-Small × Quality Text	0.6581	0.7482	0.6658	0.7575	0.8228	0.7704	+0.0995	+0.0746
VoiceCLAP-Large × Prompt Match	1.3126	1.4088	1.3272	0.7575	0.8228	0.7704	-0.5551	-0.5860
VoiceCLAP-Small × Prompt Match	0.7722	0.8780	0.7722	0.7575	0.8228	0.7704	-0.0147	-0.0552
v1 Natural (Large)	0.9228	0.9954	0.9296	0.7575	0.8228	0.7704	-0.1653	-0.1726
v2 Authentic (Large)	0.9555	1.0287	0.9655	0.7575	0.8228	0.7704	-0.1979	-0.2059
v3 Professional (Large)	0.8112	0.8743	0.8188	0.7575	0.8228	0.7704	-0.0537	-0.0515
v4 Expressive (Large)	0.9502	1.0332	0.9583	0.7575	0.8228	0.7704	-0.1927	-0.2104
v5 Cinematic (Large)	0.9240	0.9983	0.9311	0.7575	0.8228	0.7704	-0.1664	-0.1755
v6 Natural (Small)	0.7164	0.7915	0.7249	0.7575	0.8228	0.7704	+0.0411	+0.0313
v7 Authentic (Small)	0.6906	0.7536	0.7012	0.7575	0.8228	0.7704	+0.0669	+0.0692
v8 Professional (Small)	0.5511	0.6205	0.5614	0.7575	0.8228	0.7704	+0.2064	+0.2023
v9 Natural−Robotic (Large)	1.6632	1.7826	1.6782	1.5151	1.6456	1.5408	-0.1481	-0.1370
v10 Authentic−Cheap (Large)	1.7424	1.8624	1.7614	1.5151	1.6456	1.5408	-0.2273	-0.2168
v11 Professional−Distorted (Large)	1.6475	1.7658	1.6623	1.5151	1.6456	1.5408	-0.1324	-0.1203
v12 Expressive−Flat (Large)	1.6454	1.7703	1.6586	1.5151	1.6456	1.5408	-0.1304	-0.1248
v13 FullPos−FullNeg (Large)	1.6652	1.7810	1.6805	1.5151	1.6456	1.5408	-0.1502	-0.1355
v14 Warm−Robotic (Large)	1.5847	1.6941	1.5959	1.5151	1.6456	1.5408	-0.0696	-0.0485
v15 Natural−Robotic (Small)	1.7221	1.8449	1.7406	1.5151	1.6456	1.5408	-0.2070	-0.1993
v16 Authentic−Cheap (Small)	1.7422	1.8908	1.7639	1.5151	1.6456	1.5408	-0.2271	-0.2452
v17 Professional−Distorted (Small)	1.5965	1.7306	1.6031	1.5151	1.6456	1.5408	-0.0814	-0.0850
v18 Expressive−Flat (Small)	1.4722	1.6289	1.4714	1.5151	1.6456	1.5408	+0.0428	+0.0167
v19 FullPos−FullNeg (Small)	1.5419	1.6612	1.5577	1.5151	1.6456	1.5408	-0.0269	-0.0156
v20 Warm−Robotic (Small)	1.6154	1.7658	1.6233	1.5151	1.6456	1.5408	-0.1004	-0.1202
v21 Sanitized Prompt (Large)	1.0835	1.1674	1.0950	0.7575	0.8228	0.7704	-0.3260	-0.3446
v22 Sanitized Prompt (Small)	0.7377	0.8380	0.7399	0.7575	0.8228	0.7704	+0.0199	-0.0152
v23 Sanitized−Uncanny (Large)	1.8437	1.9842	1.8619	1.5151	1.6456	1.5408	-0.3286	-0.3386
v24 Sanitized−Uncanny (Small)	1.6793	1.8623	1.6847	1.5151	1.6456	1.5408	-0.1643	-0.2167

Per-Prompt Ablation: Standard Reward (N=10)

#	Lang	No Suffix Mean	No Suffix Best	With Suffix Mean	With Suffix Best	Δ Mean	Δ Best
0	English	4.4816	4.6844	4.3707	4.8535	-0.1109	+0.1691
1	French	4.7925	5.0537	4.7892	5.0296	-0.0034	-0.0241
2	English	1.0827	1.2962	1.1411	1.3218	+0.0585	+0.0256
3	German	4.8200	4.9088	4.7944	4.9058	-0.0256	-0.0030
4	French	4.5691	4.7517	4.5799	4.8368	+0.0107	+0.0851
5	French	3.7270	4.2955	3.8205	4.6849	+0.0936	+0.3894
6	English	3.0129	3.3577	3.0892	3.7102	+0.0763	+0.3525
7	German	2.3007	2.9392	2.1858	2.5297	-0.1149	-0.4095
8	Spanish	4.7659	4.9141	4.7331	4.8429	-0.0327	-0.0712
9	French	3.5568	3.8958	3.2281	3.8884	-0.3288	-0.0074

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	3.9546	4.0007	4.0772	4.1326	4.1640
Std Dev	1.2168	1.2467	1.2122	1.2253	1.2061
Avg Mean	3.6288	3.6854	3.6552	3.6658	3.6684

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	4.8495	4.8495	4.8853	4.9242	4.9242
1	French	4.8763	4.9786	5.0537	5.0537	5.0537
2	English	1.3062	1.3142	1.3050	1.3045	1.3245
3	German	4.9532	4.9336	4.9589	4.9589	4.9703
4	French	4.5461	4.7517	4.7825	4.8121	4.8121
5	French	4.2109	4.3700	4.3700	4.6713	4.7707
6	English	3.8101	3.6776	3.7326	3.9251	3.9251
7	German	2.5013	2.4460	2.7952	2.7806	2.9489
8	Spanish	4.9337	4.9628	4.9645	4.9645	4.9645
9	French	3.5592	3.7227	3.9241	3.9309	3.9459

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9101	0.9363	0.9531	0.9551	0.9596
Std Dev	0.2855	0.2926	0.2796	0.2767	0.2772
Avg Mean	0.8395	0.8520	0.8452	0.8502	0.8505

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1649	1.2016	1.2032	1.2000	1.2083
1	French	1.1614	1.1634	1.1764	1.1764	1.1764
2	English	0.3377	0.3422	0.3625	0.3625	0.3625
3	German	1.1193	1.1491	1.1478	1.1506	1.1506
4	French	1.2467	1.2702	1.2753	1.2651	1.2753
5	French	0.8431	0.9029	0.9029	0.9199	0.9329
6	English	0.8819	0.9138	0.9167	0.9167	0.9167
7	German	0.5716	0.5705	0.6449	0.6559	0.6679
8	Spanish	0.8917	0.8849	0.9194	0.9194	0.9194
9	French	0.8825	0.9643	0.9823	0.9850	0.9861

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7434	0.7655	0.7740	0.7969	0.7989
Std Dev	0.2771	0.2999	0.2801	0.2858	0.2856
Avg Mean	0.6489	0.6534	0.6377	0.6473	0.6468

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.8133	0.7898	0.8325	0.8476	0.8532
1	French	0.9710	1.0147	1.0117	1.0346	1.0346
2	English	0.2543	0.2492	0.2606	0.2626	0.2656
3	German	1.0623	1.0517	1.0732	1.0850	1.0850
4	French	1.1038	1.1580	1.0973	1.1547	1.1580
5	French	0.5162	0.5541	0.6197	0.6439	0.6439
6	English	0.7100	0.7589	0.7611	0.7611	0.7611
7	German	0.4707	0.4178	0.5047	0.5004	0.5047
8	Spanish	0.6328	0.6432	0.5757	0.6616	0.6616
9	French	0.8997	1.0174	1.0032	1.0174	1.0211

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.3972	1.4204	1.4363	1.4510	1.4585
Std Dev	0.4534	0.4511	0.4395	0.4345	0.4370
Avg Mean	1.2788	1.3014	1.2854	1.2922	1.2937

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.7369	1.7760	1.7637	1.7724	1.7760
1	French	1.8243	1.8369	1.8330	1.8369	1.8409
2	English	0.4488	0.4581	0.4642	0.4642	0.4720
3	German	1.8076	1.8342	1.8335	1.8311	1.8435
4	French	1.5887	1.5845	1.5863	1.5968	1.5968
5	French	1.5383	1.6107	1.6307	1.6593	1.6960
6	English	1.3237	1.2919	1.3063	1.3304	1.3304
7	German	0.8218	0.8661	0.9455	1.0004	1.0004
8	Spanish	1.6505	1.6173	1.6498	1.6498	1.6599
9	French	1.2318	1.3282	1.3502	1.3687	1.3687

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8601	0.8677	0.8774	0.8891	0.9028
Std Dev	0.2878	0.2836	0.2824	0.2867	0.2802
Avg Mean	0.7585	0.7676	0.7591	0.7618	0.7624

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.3089	1.3236	1.3231	1.3180	1.3236
1	French	0.8770	0.8758	0.8933	0.8933	0.8978
2	English	0.3393	0.3120	0.3140	0.3172	0.3393
3	German	1.1609	1.1798	1.1709	1.1873	1.1873
4	French	0.8536	0.8944	0.8916	0.9098	0.9098
5	French	0.9161	1.0036	1.0036	0.9609	1.0036
6	English	1.1154	0.9546	1.0126	1.1154	1.1198
7	German	0.5875	0.6317	0.6484	0.6637	0.7078
8	Spanish	0.7441	0.7594	0.7529	0.7690	0.7746
9	French	0.6984	0.7422	0.7640	0.7566	0.7640

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9750	1.0031	1.0236	1.0246	1.0333
Std Dev	0.2898	0.2938	0.2902	0.2866	0.2863
Avg Mean	0.8977	0.9104	0.9045	0.9083	0.9090

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.2812	1.3153	1.3167	1.3153	1.3167
1	French	1.2239	1.2264	1.2587	1.2587	1.2587
2	English	0.3855	0.3863	0.3988	0.3988	0.3988
3	German	1.1058	1.1254	1.1241	1.1259	1.1259
4	French	1.3085	1.3121	1.3425	1.3218	1.3425
5	French	0.9486	1.0200	1.0200	1.0337	1.0564
6	English	0.9223	0.9729	0.9929	0.9929	0.9982
7	German	0.6271	0.6375	0.6959	0.7044	0.7279
8	Spanish	0.9762	0.9762	0.9916	0.9957	1.0097
9	French	0.9714	1.0590	1.0952	1.0985	1.0985

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.0092	1.0361	1.0536	1.0591	1.0631
Std Dev	0.2796	0.2874	0.2747	0.2752	0.2742
Avg Mean	0.9293	0.9444	0.9367	0.9417	0.9420

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1473	1.1727	1.1743	1.1738	1.1743
1	French	1.2769	1.2900	1.2976	1.2976	1.2976
2	English	0.3942	0.3923	0.4098	0.4072	0.4110
3	German	1.1747	1.2026	1.1973	1.2026	1.2026
4	French	1.2812	1.3065	1.3131	1.3131	1.3131
5	French	1.0562	1.1132	1.1132	1.1470	1.1571
6	English	0.9454	0.9761	0.9719	0.9842	0.9996
7	German	0.6863	0.6881	0.7825	0.7917	0.7929
8	Spanish	1.1554	1.1523	1.1853	1.1853	1.1867
9	French	0.9747	1.0676	1.0913	1.0881	1.0964

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8574	0.8850	0.9008	0.9019	0.9088
Std Dev	0.2657	0.2709	0.2551	0.2524	0.2551
Avg Mean	0.7889	0.8015	0.7963	0.8004	0.8005

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.0724	1.1219	1.1183	1.1142	1.1367
1	French	1.1011	1.1017	1.1149	1.1149	1.1149
2	English	0.3100	0.3274	0.3536	0.3536	0.3536
3	German	1.0849	1.1021	1.0997	1.1039	1.1039
4	French	1.1436	1.1703	1.1724	1.1602	1.1724
5	French	0.7820	0.8424	0.8424	0.8601	0.8731
6	English	0.8488	0.8953	0.8884	0.8884	0.8953
7	German	0.5576	0.5474	0.6260	0.6353	0.6388
8	Spanish	0.8439	0.8409	0.8767	0.8667	0.8770
9	French	0.8301	0.9009	0.9151	0.9220	0.9220

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.0185	1.0361	1.0584	1.0717	1.0756
Std Dev	0.2936	0.2848	0.2884	0.2924	0.2915
Avg Mean	0.9240	0.9409	0.9344	0.9365	0.9368

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.2724	1.2842	1.2953	1.2953	1.2953
1	French	1.2554	1.2501	1.3152	1.3152	1.3152
2	English	0.3948	0.4184	0.4087	0.4138	0.4184
3	German	1.0474	1.0655	1.0659	1.0767	1.0767
4	French	1.2318	1.2347	1.2657	1.2965	1.2965
5	French	1.1161	1.1557	1.1742	1.2255	1.2255
6	English	1.1639	1.1658	1.1415	1.1725	1.1850
7	German	0.6103	0.6372	0.7032	0.7113	0.7113
8	Spanish	1.1484	1.1364	1.1781	1.1781	1.1781
9	French	0.9444	1.0127	1.0360	1.0324	1.0538

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9942	1.0057	1.0249	1.0338	1.0391
Std Dev	0.2707	0.2753	0.2622	0.2664	0.2675
Avg Mean	0.8979	0.9136	0.9063	0.9094	0.9100

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.0969	1.1129	1.1355	1.1355	1.1355
1	French	1.2305	1.2361	1.2439	1.2439	1.2460
2	English	0.4020	0.3989	0.4057	0.4133	0.4133
3	German	1.1007	1.1249	1.1249	1.1132	1.1249
4	French	1.2252	1.2376	1.2442	1.2570	1.2570
5	French	1.1337	1.1758	1.1852	1.2318	1.2494
6	English	1.0978	1.0669	1.0705	1.0978	1.0978
7	German	0.6140	0.6238	0.7296	0.7210	0.7296
8	Spanish	1.0503	1.0292	1.0467	1.0494	1.0540
9	French	0.9911	1.0506	1.0623	1.0748	1.0836

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7754	0.8111	0.8265	0.8481	0.8562
Std Dev	0.2489	0.2572	0.2529	0.2562	0.2528
Avg Mean	0.7011	0.7094	0.6968	0.7031	0.7030

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.5961	0.6320	0.6632	0.6942	0.7206
1	French	1.0722	1.0964	1.1172	1.1408	1.1408
2	English	0.3032	0.2928	0.3076	0.3139	0.3139
3	German	1.0064	0.9901	1.0367	1.0477	1.0477
4	French	1.0332	1.0332	1.0546	1.0809	1.0809
5	French	0.6138	0.6983	0.6910	0.7417	0.7417
6	English	0.7623	0.8702	0.7969	0.8378	0.8702
7	German	0.5840	0.5762	0.6664	0.6450	0.6664
8	Spanish	0.8426	0.8736	0.8782	0.9211	0.9211
9	French	0.9404	1.0485	1.0534	1.0583	1.0583

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7471	0.7727	0.7910	0.8014	0.8104
Std Dev	0.2238	0.2381	0.2310	0.2308	0.2324
Avg Mean	0.6756	0.6847	0.6717	0.6787	0.6788

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.6445	0.6336	0.6879	0.7069	0.7227
1	French	1.0493	1.0703	1.0736	1.0749	1.0749
2	English	0.3181	0.3160	0.3209	0.3221	0.3221
3	German	0.9596	0.9767	1.0000	0.9866	1.0000
4	French	0.9540	0.9726	0.9779	1.0113	1.0113
5	French	0.6895	0.7726	0.7645	0.8041	0.8041
6	English	0.6247	0.7115	0.6667	0.6856	0.7115
7	German	0.5465	0.4909	0.5858	0.5857	0.5858
8	Spanish	0.8518	0.8706	0.9000	0.9041	0.9041
9	French	0.8327	0.9118	0.9322	0.9322	0.9677

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.6180	0.6390	0.6422	0.6624	0.6708
Std Dev	0.2338	0.2402	0.2228	0.2209	0.2280
Avg Mean	0.5415	0.5460	0.5348	0.5412	0.5416

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.8310	0.7950	0.8327	0.8434	0.8599
1	French	0.7541	0.8065	0.7737	0.8065	0.8065
2	English	0.2236	0.2211	0.2369	0.2363	0.2401
3	German	0.8567	0.8666	0.8691	0.8600	0.8737
4	French	0.9368	0.9622	0.8921	0.9368	0.9622
5	French	0.3894	0.4351	0.4726	0.4907	0.4907
6	English	0.5885	0.6310	0.6484	0.6484	0.6484
7	German	0.3994	0.3874	0.4370	0.4849	0.4849
8	Spanish	0.5215	0.5180	0.4820	0.5390	0.5390
9	French	0.6790	0.7669	0.7776	0.7776	0.8027

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7588	1.7906	1.8169	1.8278	1.8339
Std Dev	0.5025	0.5067	0.4897	0.4912	0.4898
Avg Mean	1.6184	1.6460	1.6297	1.6368	1.6386

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1794	2.1794	2.1909	2.1909	2.1909
1	French	2.1325	2.1447	2.1478	2.1525	2.1525
2	English	0.6168	0.6158	0.6198	0.6226	0.6226
3	German	2.0821	2.0975	2.0896	2.1060	2.1060
4	French	2.2102	2.2160	2.2246	2.2251	2.2251
5	French	1.8136	1.9156	1.9156	1.9413	1.9714
6	English	1.6954	1.7115	1.7439	1.7492	1.7581
7	German	1.2074	1.2167	1.3769	1.3874	1.4091
8	Spanish	1.9343	1.9343	1.9553	1.9912	1.9912
9	French	1.7165	1.8746	1.9047	1.9120	1.9120

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.8458	1.8761	1.9049	1.9166	1.9189
Std Dev	0.5173	0.5306	0.5105	0.5131	0.5123
Avg Mean	1.6952	1.7243	1.7067	1.7155	1.7166

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.9831	1.9831	1.9832	1.9915	1.9915
1	French	2.3181	2.3265	2.3384	2.3384	2.3384
2	English	0.6727	0.6644	0.6739	0.6739	0.6788
3	German	2.1530	2.1925	2.1829	2.1925	2.1925
4	French	2.3130	2.3408	2.3496	2.3500	2.3500
5	French	2.0244	2.1040	2.1151	2.1764	2.1764
6	English	1.7337	1.7350	1.7275	1.7556	1.7591
7	German	1.2732	1.2647	1.4695	1.4734	1.4734
8	Spanish	2.1688	2.1482	2.1943	2.1943	2.2014
9	French	1.8184	2.0021	2.0147	2.0205	2.0279

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7378	1.7827	1.8108	1.8153	1.8213
Std Dev	0.5083	0.5139	0.4875	0.4872	0.4886
Avg Mean	1.5996	1.6267	1.6137	1.6218	1.6225

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1264	2.1857	2.1937	2.1941	2.1961
1	French	2.2038	2.1910	2.2209	2.2209	2.2209
2	English	0.6428	0.6665	0.6948	0.6948	0.6948
3	German	2.0039	2.0346	2.0442	2.0442	2.0442
4	French	2.2690	2.3018	2.2951	2.2947	2.3018
5	French	1.7740	1.8894	1.8894	1.9319	1.9457
6	English	1.6992	1.7574	1.7450	1.7450	1.7574
7	German	1.1192	1.1122	1.2889	1.2986	1.2986
8	Spanish	1.8646	1.8423	1.8783	1.8694	1.8866
9	French	1.6754	1.8463	1.8581	1.8597	1.8670

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7552	1.7865	1.8133	1.8333	1.8406
Std Dev	0.5150	0.5011	0.4986	0.5046	0.5052
Avg Mean	1.5997	1.6314	1.6166	1.6206	1.6219

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1971	2.1971	2.2140	2.2140	2.2140
1	French	2.1709	2.1557	2.2459	2.2459	2.2459
2	English	0.5959	0.6191	0.6191	0.6222	0.6233
3	German	1.8857	1.9078	1.9085	1.9139	1.9493
4	French	2.0865	2.1030	2.1157	2.1549	2.1549
5	French	1.9484	2.0159	2.0159	2.1084	2.1097
6	English	1.9152	1.9041	1.8811	1.9226	1.9431
7	German	1.1279	1.1905	1.3057	1.3265	1.3281
8	Spanish	2.0190	2.0190	2.0544	2.0650	2.0650
9	French	1.6051	1.7526	1.7724	1.7591	1.7724

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7552	1.7934	1.8229	1.8280	1.8360
Std Dev	0.5249	0.5279	0.5106	0.5108	0.5092
Avg Mean	1.6186	1.6464	1.6315	1.6389	1.6404

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1390	2.1669	2.1769	2.1760	2.1769
1	French	2.2840	2.2839	2.3132	2.3132	2.3132
2	English	0.6080	0.6223	0.6360	0.6360	0.6360
3	German	2.0264	2.0533	2.0575	2.0575	2.0575
4	French	2.2490	2.2571	2.2751	2.2666	2.2751
5	French	1.8284	1.9253	1.9253	1.9678	1.9873
6	English	1.7041	1.7305	1.7425	1.7425	1.7656
7	German	1.1266	1.1320	1.2916	1.2945	1.3186
8	Spanish	1.8750	1.8750	1.9048	1.9229	1.9229
9	French	1.7117	1.8881	1.9059	1.9030	1.9066

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6719	1.7076	1.7300	1.7401	1.7497
Std Dev	0.5043	0.5091	0.4924	0.4933	0.4923
Avg Mean	1.5430	1.5696	1.5532	1.5615	1.5629

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0858	2.1127	2.1294	2.1209	2.1294
1	French	2.0716	2.0794	2.0854	2.0817	2.0959
2	English	0.5597	0.5595	0.5642	0.5700	0.5700
3	German	2.0887	2.1301	2.1107	2.1301	2.1301
4	French	2.1116	2.1116	2.1208	2.1287	2.1287
5	French	1.6915	1.7711	1.7711	1.8079	1.8485
6	English	1.6241	1.6486	1.6711	1.6711	1.6711
7	German	1.1051	1.1148	1.2604	1.2598	1.2932
8	Spanish	1.8116	1.8116	1.8490	1.8628	1.8628
9	French	1.5697	1.7361	1.7382	1.7676	1.7676

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.8316	1.8742	1.9115	1.9234	1.9383
Std Dev	0.5408	0.5515	0.5329	0.5321	0.5352
Avg Mean	1.6750	1.7055	1.6843	1.6942	1.6950

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.8830	1.9719	2.0044	1.9967	2.0602
1	French	2.4418	2.4318	2.4515	2.4515	2.4515
2	English	0.6321	0.6397	0.6540	0.6512	0.6540
3	German	2.2744	2.2793	2.2945	2.3125	2.3125
4	French	2.3183	2.3586	2.4140	2.3801	2.4140
5	French	1.8445	1.9063	1.9413	2.0193	2.0193
6	English	1.8042	1.8912	1.8185	1.8422	1.8912
7	German	1.2370	1.2076	1.4390	1.4474	1.4474
8	Spanish	2.0023	2.0023	2.0029	2.0383	2.0383
9	French	1.8788	2.0530	2.0949	2.0949	2.0949

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.8657	1.8969	1.9666	1.9741	1.9951
Std Dev	0.5473	0.5720	0.5630	0.5615	0.5636
Avg Mean	1.6937	1.7300	1.7060	1.7167	1.7168

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.7826	1.7826	1.8602	1.8893	1.9237
1	French	2.5373	2.4852	2.5373	2.5630	2.5630
2	English	0.6786	0.6458	0.6629	0.6604	0.6786
3	German	2.3800	2.4158	2.4166	2.4512	2.4512
4	French	2.2002	2.2913	2.3926	2.3190	2.3926
5	French	2.0492	2.1373	2.1570	2.2059	2.2122
6	English	1.6666	1.7311	1.7504	1.7504	1.7641
7	German	1.3244	1.2762	1.4962	1.5280	1.5280
8	Spanish	2.1587	2.2440	2.3004	2.2813	2.3004
9	French	1.8798	1.9593	2.0928	2.0928	2.1373

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7016	1.7315	1.7590	1.7725	1.7755
Std Dev	0.5213	0.5325	0.5174	0.5154	0.5164
Avg Mean	1.5544	1.5782	1.5583	1.5685	1.5697

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0680	2.1123	2.1329	2.1329	2.1329
1	French	2.1845	2.1812	2.1980	2.1980	2.1980
2	English	0.5533	0.5410	0.5539	0.5615	0.5615
3	German	2.0693	2.0525	2.0693	2.0852	2.0891
4	French	2.2154	2.2812	2.2721	2.2721	2.2812
5	French	1.7162	1.7538	1.8142	1.8170	1.8304
6	English	1.6823	1.6819	1.6790	1.7115	1.7115
7	German	1.1034	1.1275	1.2649	1.2747	1.2775
8	Spanish	1.7463	1.7329	1.7396	1.7926	1.7926
9	French	1.6772	1.8512	1.8659	1.8799	1.8799

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.5761	1.6173	1.6483	1.6891	1.7099
Std Dev	0.4859	0.4723	0.4746	0.4751	0.4785
Avg Mean	1.4361	1.4513	1.4485	1.4490	1.4513

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0167	2.0921	2.0652	2.0561	2.0921
1	French	1.8167	1.7639	1.9345	1.9345	1.9345
2	English	0.5129	0.5415	0.5292	0.5292	0.5436
3	German	1.7639	1.8144	1.8174	1.8178	1.8178
4	French	1.7889	1.7824	1.9028	1.9040	1.9401
5	French	1.7407	1.7813	1.7813	1.9195	1.9680
6	English	1.6225	1.7011	1.6334	1.7386	1.7386
7	German	0.9665	1.0668	1.1447	1.2484	1.2484
8	Spanish	2.0735	2.0618	2.0405	2.1084	2.1084
9	French	1.4591	1.5680	1.6341	1.6341	1.7073

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6357	1.6796	1.7039	1.7201	1.7248
Std Dev	0.4760	0.4878	0.4697	0.4647	0.4662
Avg Mean	1.5001	1.5248	1.5075	1.5163	1.5178

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.8948	1.9396	1.9764	1.9849	1.9849
1	French	2.0914	2.1129	2.1206	2.1206	2.1206
2	English	0.5701	0.5703	0.5808	0.5903	0.5903
3	German	2.0078	1.9958	2.0078	2.0045	2.0092
4	French	2.0486	2.1202	2.0950	2.1063	2.1202
5	French	1.6271	1.6975	1.7564	1.7724	1.7825
6	English	1.5685	1.6643	1.5997	1.6460	1.6643
7	German	1.1157	1.1053	1.2710	1.3073	1.3073
8	Spanish	1.8430	1.8430	1.8656	1.8656	1.8656
9	French	1.5896	1.7474	1.7656	1.8034	1.8034

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7297	1.7715	1.8069	1.8281	1.8450
Std Dev	0.5512	0.5606	0.5566	0.5483	0.5554
Avg Mean	1.5748	1.5983	1.5818	1.5902	1.5910

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0987	2.1787	2.1955	2.2008	2.2211
1	French	2.2335	2.1805	2.2823	2.2823	2.2823
2	English	0.5374	0.5468	0.5585	0.5585	0.5585
3	German	2.1886	2.1958	2.1843	2.2180	2.2180
4	French	2.2001	2.2611	2.3347	2.2925	2.3426
5	French	1.8094	1.8033	1.8196	1.8954	1.9618
6	English	1.6844	1.7625	1.7094	1.7348	1.7625
7	German	1.0359	1.0306	1.1912	1.2506	1.2506
8	Spanish	1.7668	1.8401	1.7810	1.8356	1.8401
9	French	1.7423	1.9158	2.0128	2.0128	2.0128

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.1532	1.1751	1.1969	1.2031	1.2158
Std Dev	0.3592	0.3584	0.3526	0.3479	0.3495
Avg Mean	1.0533	1.0724	1.0610	1.0653	1.0664

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.5740	1.5844	1.5813	1.5813	1.5940
1	French	1.4557	1.4532	1.4931	1.4931	1.4931
2	English	0.4226	0.4262	0.4321	0.4321	0.4461
3	German	1.3554	1.3975	1.4017	1.3870	1.4017
4	French	1.2531	1.2596	1.2630	1.2702	1.2704
5	French	1.3202	1.3904	1.4050	1.4208	1.4602
6	English	1.2157	1.2186	1.2155	1.2302	1.2619
7	German	0.6456	0.6824	0.7430	0.7800	0.7846
8	Spanish	1.2416	1.2072	1.2465	1.2487	1.2487
9	French	1.0485	1.1319	1.1879	1.1879	1.1972

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8346	0.8339	0.8457	0.8624	0.8819
Std Dev	0.2829	0.2756	0.2813	0.2823	0.2844
Avg Mean	0.7299	0.7319	0.7253	0.7279	0.7295

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.3277	1.3434	1.3526	1.3339	1.3526
1	French	0.8354	0.8514	0.8513	0.8579	0.8620
2	English	0.3389	0.3153	0.3164	0.3184	0.3389
3	German	1.0372	1.0500	1.0550	1.0646	1.0868
4	French	0.7777	0.7777	0.7938	0.8160	0.8160
5	French	0.8999	0.9857	0.9857	0.9709	1.0191
6	English	1.1177	0.9355	1.0244	1.1225	1.1515
7	German	0.5824	0.6189	0.6292	0.6494	0.6994
8	Spanish	0.7867	0.7738	0.7439	0.7867	0.7878
9	French	0.6421	0.6872	0.7048	0.7035	0.7048

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.9507	1.9826	2.0204	2.0303	2.0413
Std Dev	0.5932	0.5908	0.5770	0.5746	0.5733
Avg Mean	1.7915	1.8252	1.8047	1.8127	1.8148

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.5428	2.5578	2.5701	2.5701	2.5732
1	French	2.4516	2.4411	2.4838	2.4838	2.4838
2	English	0.6597	0.6714	0.6801	0.6801	0.6910
3	German	2.3243	2.3700	2.3712	2.3700	2.3712
4	French	2.2234	2.2287	2.2458	2.2472	2.2488
5	French	2.1543	2.2619	2.2863	2.3173	2.3656
6	English	1.9721	1.9571	1.9686	2.0113	2.0365
7	German	1.1920	1.2294	1.3822	1.4114	1.4271
8	Spanish	2.1722	2.1375	2.1884	2.1841	2.1884
9	French	1.8144	1.9714	2.0272	2.0272	2.0272

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.8188	1.8346	1.8690	1.9133	1.9251
Std Dev	0.5448	0.5561	0.5507	0.5476	0.5391
Avg Mean	1.6350	1.6602	1.6474	1.6528	1.6545

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.5393	2.5393	2.5454	2.5520	2.5520
1	French	2.1710	2.1591	2.2309	2.2309	2.2309
2	English	0.6485	0.6004	0.5913	0.6071	0.6485
3	German	2.2496	2.2748	2.2354	2.3283	2.3283
4	French	1.9298	2.0449	2.1306	2.0866	2.1306
5	French	1.9965	2.1434	2.1434	2.1468	2.1712
6	English	2.0356	1.8748	1.8860	2.0871	2.0871
7	German	1.2689	1.3276	1.4254	1.5185	1.5223
8	Spanish	1.8110	1.8026	1.8054	1.8557	1.8557
9	French	1.5374	1.5788	1.6966	1.7204	1.7243

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100

DramaBox Prompt

A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100

DramaBox Prompt

High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.

[Prompts 0-1] · Prompts 2-3 · Prompts 4-5 · Prompts 6-7 · Prompts 8-9