Scenema Audio — Best-of-N: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Scenema Main Scenema CC Scenema CC2 Scenema AC DramaBox Best-of-N DramaBox Main

[Prompts 0-1] · Prompts 2-3 · Prompts 4-5 · Prompts 6-7 · Prompts 8-9

Ranking Method:

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#	Key	Score Formula	Text Prompt(s)
0	standard	Content Enjoyment	—
1	clap_lq	cos(audio, quality_text)	"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2	clap_sq	cos(audio, quality_text)	"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3	clap_lp	cos(audio, prompt)	Original DramaBox prompt
4	clap_sp	cos(audio, prompt)	Original DramaBox prompt
5	v1_nat_L	cos(audio, nat)	"natural, spontaneous, lifelike speech with genuine emotion"
6	v2_auth_L	cos(audio, auth)	"authentic, emotionally truthful, deeply felt voice performance"
7	v3_pro_L	cos(audio, pro)	"professional studio recording, crystal clear high-fidelity audio"
8	v4_expr_L	cos(audio, expr)	"expressive, dynamic voice acting with rich emotional range"
9	v5_cine_L	cos(audio, cine)	"immersive cinematic narration, compelling storytelling"
10	v6_nat_S	cos(audio, nat)	"natural, spontaneous, lifelike speech with genuine emotion"
11	v7_auth_S	cos(audio, auth)	"authentic, emotionally truthful, deeply felt voice performance"
12	v8_pro_S	cos(audio, pro)	"professional studio recording, crystal clear high-fidelity audio"
13	v9_nr_L	cos(audio, nat) − cos(audio, rob)	+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14	v10_ac_L	cos(audio, auth) − cos(audio, cheap)	+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15	v11_pd_L	cos(audio, pro) − cos(audio, dist)	+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16	v12_ef_L	cos(audio, expr) − cos(audio, flat)	+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17	v13_ff_L	cos(audio, full_pos) − cos(audio, full_neg)	+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18	v14_wr_L	cos(audio, warm) − cos(audio, rob)	+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19	v15_nr_S	cos(audio, nat) − cos(audio, rob)	+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20	v16_ac_S	cos(audio, auth) − cos(audio, cheap)	+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21	v17_pd_S	cos(audio, pro) − cos(audio, dist)	+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22	v18_ef_S	cos(audio, expr) − cos(audio, flat)	+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23	v19_ff_S	cos(audio, full_pos) − cos(audio, full_neg)	+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24	v20_wr_S	cos(audio, warm) − cos(audio, rob)	+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25	v21_san_L	cos(audio, sanitized_prompt)	Quoted speech removed (Large)
26	v22_san_S	cos(audio, sanitized_prompt)	Quoted speech removed (Small)
27	v23_snr_L	cos(audio, sanitized) − cos(audio, neg_san)	Sanitized / − "robotic, distorted, uncanny" (Large)
28	v24_snr_S	cos(audio, sanitized) − cos(audio, neg_san)	Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method	N=5	N=10	N=25	N=50	N=100	Gain N=5→100	Knee Point
Standard: (1−WER) × Content Enjoyment	3.6402	4.0591	4.2778	4.3003	4.3509	+0.7107	N=50
VoiceCLAP-Large × Quality Text	0.8578	0.9465	0.9788	0.9977	1.0018	+0.1440	N=50
VoiceCLAP-Small × Quality Text	0.6678	0.7458	0.7714	0.7913	0.7979	+0.1302	N=100
VoiceCLAP-Large × Prompt Match	1.2582	1.4203	1.4750	1.4850	1.5013	+0.2430	N=50
VoiceCLAP-Small × Prompt Match	0.7316	0.8069	0.8436	0.8484	0.8696	+0.1380	N=50
v1 Natural (Large)	0.9245	1.0314	1.0628	1.0900	1.0968	+0.1722	N=100
v2 Authentic (Large)	0.9405	1.0561	1.1017	1.1205	1.1271	+0.1866	N=50
v3 Professional (Large)	0.8097	0.8920	0.9233	0.9415	0.9450	+0.1353	N=50
v4 Expressive (Large)	0.9601	1.0789	1.1310	1.1307	1.1498	+0.1896	N=50
v5 Cinematic (Large)	0.9141	1.0181	1.0645	1.0716	1.0897	+0.1756	N=50
v6 Natural (Small)	0.6865	0.7797	0.7999	0.8265	0.8493	+0.1628	N=100
v7 Authentic (Small)	0.6046	0.6918	0.7222	0.7402	0.7547	+0.1501	N=100
v8 Professional (Small)	0.5712	0.6338	0.6486	0.6738	0.6775	+0.1063	N=100
v9 Natural−Robotic (Large)	1.6291	1.8186	1.8798	1.9103	1.9195	+0.2904	N=50
v10 Authentic−Cheap (Large)	1.6832	1.8913	1.9613	1.9899	1.9996	+0.3164	N=50
v11 Professional−Distorted (Large)	1.6093	1.7896	1.8517	1.8897	1.8934	+0.2841	N=50
v12 Expressive−Flat (Large)	1.6379	1.8253	1.9079	1.9154	1.9357	+0.2977	N=50
v13 FullPos−FullNeg (Large)	1.6227	1.8137	1.8707	1.9040	1.9110	+0.2883	N=50
v14 Warm−Robotic (Large)	1.5354	1.7098	1.7688	1.7972	1.8013	+0.2659	N=50
v15 Natural−Robotic (Small)	1.6384	1.8358	1.9058	1.9259	1.9414	+0.3030	N=50
v16 Authentic−Cheap (Small)	1.6024	1.8099	1.8977	1.9041	1.9280	+0.3255	N=50
v17 Professional−Distorted (Small)	1.4980	1.6583	1.7254	1.7480	1.7686	+0.2706	N=50
v18 Expressive−Flat (Small)	1.5510	1.7345	1.8091	1.8207	1.8629	+0.3119	N=50
v19 FullPos−FullNeg (Small)	1.4562	1.6334	1.6862	1.7129	1.7295	+0.2733	N=50
v20 Warm−Robotic (Small)	1.5747	1.7640	1.8339	1.8643	1.8857	+0.3109	N=50
v21 Sanitized Prompt (Large)	1.0507	1.1807	1.2371	1.2467	1.2569	+0.2062	N=50
v22 Sanitized Prompt (Small)	0.7210	0.8015	0.8351	0.8340	0.8599	+0.1389	N=50
v23 Sanitized−Uncanny (Large)	1.7781	1.9943	2.0676	2.0926	2.1024	+0.3243	N=50
v24 Sanitized−Uncanny (Small)	1.5440	1.7067	1.7908	1.7898	1.8203	+0.2763	N=50

Diminishing Returns — All Methods Overlaid

Marginal Improvement per Additional Candidate

Method	N=5→10	N=10→25	N=25→50	N=50→100
Standard: (1−WER) × Content Enjoyment	0.08379/cand (11.5%)	0.01457/cand (5.4%)	0.00090/cand (0.5%)	0.00101/cand (1.2%)
VoiceCLAP-Large × Quality Text	0.01773/cand (10.3%)	0.00215/cand (3.4%)	0.00076/cand (1.9%)	0.00008/cand (0.4%)
VoiceCLAP-Small × Quality Text	0.01560/cand (11.7%)	0.00171/cand (3.4%)	0.00079/cand (2.6%)	0.00013/cand (0.8%)
VoiceCLAP-Large × Prompt Match	0.03241/cand (12.9%)	0.00365/cand (3.9%)	0.00040/cand (0.7%)	0.00033/cand (1.1%)
VoiceCLAP-Small × Prompt Match	0.01506/cand (10.3%)	0.00244/cand (4.5%)	0.00019/cand (0.6%)	0.00042/cand (2.5%)
v1 Natural (Large)	0.02138/cand (11.6%)	0.00209/cand (3.0%)	0.00109/cand (2.6%)	0.00014/cand (0.6%)
v2 Authentic (Large)	0.02312/cand (12.3%)	0.00304/cand (4.3%)	0.00075/cand (1.7%)	0.00013/cand (0.6%)
v3 Professional (Large)	0.01647/cand (10.2%)	0.00208/cand (3.5%)	0.00073/cand (2.0%)	0.00007/cand (0.4%)
v4 Expressive (Large)	0.02374/cand (12.4%)	0.00348/cand (4.8%)	-0.00001/cand (-0.0%)	0.00038/cand (1.7%)
v5 Cinematic (Large)	0.02080/cand (11.4%)	0.00309/cand (4.6%)	0.00028/cand (0.7%)	0.00036/cand (1.7%)
v6 Natural (Small)	0.01864/cand (13.6%)	0.00135/cand (2.6%)	0.00106/cand (3.3%)	0.00046/cand (2.8%)
v7 Authentic (Small)	0.01745/cand (14.4%)	0.00202/cand (4.4%)	0.00072/cand (2.5%)	0.00029/cand (2.0%)
v8 Professional (Small)	0.01251/cand (11.0%)	0.00099/cand (2.3%)	0.00101/cand (3.9%)	0.00007/cand (0.5%)
v9 Natural−Robotic (Large)	0.03790/cand (11.6%)	0.00408/cand (3.4%)	0.00122/cand (1.6%)	0.00018/cand (0.5%)
v10 Authentic−Cheap (Large)	0.04161/cand (12.4%)	0.00467/cand (3.7%)	0.00114/cand (1.5%)	0.00019/cand (0.5%)
v11 Professional−Distorted (Large)	0.03606/cand (11.2%)	0.00414/cand (3.5%)	0.00152/cand (2.0%)	0.00007/cand (0.2%)
v12 Expressive−Flat (Large)	0.03747/cand (11.4%)	0.00551/cand (4.5%)	0.00030/cand (0.4%)	0.00041/cand (1.1%)
v13 FullPos−FullNeg (Large)	0.03819/cand (11.8%)	0.00380/cand (3.1%)	0.00133/cand (1.8%)	0.00014/cand (0.4%)
v14 Warm−Robotic (Large)	0.03490/cand (11.4%)	0.00393/cand (3.4%)	0.00114/cand (1.6%)	0.00008/cand (0.2%)
v15 Natural−Robotic (Small)	0.03949/cand (12.1%)	0.00467/cand (3.8%)	0.00080/cand (1.1%)	0.00031/cand (0.8%)
v16 Authentic−Cheap (Small)	0.04149/cand (12.9%)	0.00585/cand (4.9%)	0.00026/cand (0.3%)	0.00048/cand (1.3%)
v17 Professional−Distorted (Small)	0.03205/cand (10.7%)	0.00448/cand (4.1%)	0.00090/cand (1.3%)	0.00041/cand (1.2%)
v18 Expressive−Flat (Small)	0.03670/cand (11.8%)	0.00497/cand (4.3%)	0.00047/cand (0.6%)	0.00084/cand (2.3%)
v19 FullPos−FullNeg (Small)	0.03544/cand (12.2%)	0.00352/cand (3.2%)	0.00107/cand (1.6%)	0.00033/cand (1.0%)
v20 Warm−Robotic (Small)	0.03786/cand (12.0%)	0.00466/cand (4.0%)	0.00121/cand (1.7%)	0.00043/cand (1.1%)
v21 Sanitized Prompt (Large)	0.02600/cand (12.4%)	0.00376/cand (4.8%)	0.00039/cand (0.8%)	0.00020/cand (0.8%)
v22 Sanitized Prompt (Small)	0.01609/cand (11.2%)	0.00224/cand (4.2%)	-0.00004/cand (-0.1%)	0.00052/cand (3.1%)
v23 Sanitized−Uncanny (Large)	0.04325/cand (12.2%)	0.00489/cand (3.7%)	0.00100/cand (1.2%)	0.00020/cand (0.5%)
v24 Sanitized−Uncanny (Small)	0.03254/cand (10.5%)	0.00560/cand (4.9%)	-0.00004/cand (-0.1%)	0.00061/cand (1.7%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking Method	Without Suffix (N=10)			With Suffix (N=10)			Delta
	Mean	Best	Median	Mean	Best	Median	Δ Mean	Δ Best
Standard: (1−WER) × Content Enjoyment	3.1170	4.1254	3.2023	3.2409	3.9538	3.3013	+0.1239	-0.1717
VoiceCLAP-Large × Quality Text	0.7321	0.9614	0.7537	0.7569	0.9062	0.7733	+0.0247	-0.0553
VoiceCLAP-Small × Quality Text	0.5502	0.7465	0.5682	0.5635	0.7113	0.5728	+0.0133	-0.0352
VoiceCLAP-Large × Prompt Match	1.0737	1.4261	1.1058	1.1218	1.3625	1.1424	+0.0481	-0.0636
VoiceCLAP-Small × Prompt Match	0.6156	0.8121	0.6283	0.6317	0.7866	0.6306	+0.0161	-0.0255
v1 Natural (Large)	0.7875	1.0529	0.8107	0.8175	0.9852	0.8363	+0.0300	-0.0677
v2 Authentic (Large)	0.8057	1.0781	0.8293	0.8421	1.0161	0.8603	+0.0364	-0.0621
v3 Professional (Large)	0.6911	0.9015	0.7115	0.7118	0.8542	0.7277	+0.0207	-0.0473
v4 Expressive (Large)	0.8209	1.0930	0.8414	0.8571	1.0353	0.8721	+0.0362	-0.0577
v5 Cinematic (Large)	0.7752	1.0270	0.7931	0.8078	0.9805	0.8245	+0.0326	-0.0465
v6 Natural (Small)	0.5778	0.7841	0.5937	0.6001	0.7481	0.6115	+0.0223	-0.0360
v7 Authentic (Small)	0.5134	0.7102	0.5244	0.5416	0.6790	0.5508	+0.0282	-0.0312
v8 Professional (Small)	0.4695	0.6360	0.4831	0.4767	0.5988	0.4841	+0.0072	-0.0372
v9 Natural−Robotic (Large)	1.3927	1.8408	1.4285	1.4487	1.7432	1.4791	+0.0560	-0.0975
v10 Authentic−Cheap (Large)	1.4413	1.9180	1.4803	1.5089	1.8300	1.5369	+0.0675	-0.0879
v11 Professional−Distorted (Large)	1.3694	1.8197	1.4082	1.4237	1.7206	1.4517	+0.0543	-0.0991
v12 Expressive−Flat (Large)	1.3944	1.8412	1.4290	1.4538	1.7604	1.4788	+0.0594	-0.0808
v13 FullPos−FullNeg (Large)	1.3882	1.8375	1.4288	1.4461	1.7389	1.4772	+0.0579	-0.0986
v14 Warm−Robotic (Large)	1.3159	1.7272	1.3539	1.3694	1.6465	1.3960	+0.0535	-0.0806
v15 Natural−Robotic (Small)	1.3953	1.8494	1.4316	1.4535	1.7761	1.4817	+0.0583	-0.0733
v16 Authentic−Cheap (Small)	1.3594	1.8379	1.3917	1.4322	1.7727	1.4477	+0.0728	-0.0652
v17 Professional−Distorted (Small)	1.2752	1.6811	1.3130	1.3261	1.5999	1.3582	+0.0509	-0.0811
v18 Expressive−Flat (Small)	1.3127	1.7830	1.3420	1.3698	1.7062	1.3795	+0.0572	-0.0768
v19 FullPos−FullNeg (Small)	1.2411	1.6452	1.2778	1.2903	1.5751	1.3187	+0.0492	-0.0700
v20 Warm−Robotic (Small)	1.3391	1.7724	1.3682	1.3857	1.7104	1.4059	+0.0466	-0.0621
v21 Sanitized Prompt (Large)	0.8911	1.2023	0.9179	0.9263	1.1374	0.9412	+0.0352	-0.0649
v22 Sanitized Prompt (Small)	0.5978	0.8030	0.6087	0.6109	0.7625	0.6133	+0.0131	-0.0405
v23 Sanitized−Uncanny (Large)	1.5126	2.0191	1.5557	1.5725	1.9158	1.6003	+0.0599	-0.1032
v24 Sanitized−Uncanny (Small)	1.3058	1.7431	1.3361	1.3486	1.6551	1.3620	+0.0428	-0.0880

Per-Prompt Ablation: Standard Reward (N=10)

#	Lang	No Suffix Mean	No Suffix Best	With Suffix Mean	With Suffix Best	Δ Mean	Δ Best
0	English	3.6474	4.5642	3.4676	3.7627	-0.1798	-0.8015
1	French	3.4752	5.1613	4.3190	5.0907	+0.8438	-0.0706
2	English	1.5328	2.6555	1.2668	1.3606	-0.2660	-1.2949
3	German	4.9209	5.0105	4.9625	5.0478	+0.0416	+0.0373
4	French	4.2007	4.5798	3.9954	4.5486	-0.2053	-0.0312
5	French	1.5720	4.0101	1.5375	4.1837	-0.0345	+0.1736
6	English	3.3146	3.8915	3.0931	3.7265	-0.2214	-0.1650
7	German	2.1635	2.8666	2.4848	2.9931	+0.3212	+0.1265
8	Spanish	2.8079	4.5547	3.6050	4.5620	+0.7971	+0.0073
9	French	3.5353	3.9602	3.6773	4.2620	+0.1420	+0.3018

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	3.6402	4.0591	4.2778	4.3003	4.3509
Std Dev	1.3837	1.0022	0.8641	0.8529	0.8602
Avg Mean	3.0269	3.2295	3.1811	3.1568	3.1818

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	4.6083	4.4591	4.7830	4.8220	4.8220
1	French	4.8813	4.9086	4.9463	4.9280	5.1613
2	English	2.0031	2.0203	2.7075	2.6555	2.7075
3	German	4.9534	4.9980	5.0693	5.0811	5.0811
4	French	4.6051	4.6821	4.8282	4.6821	4.8282
5	French	0.8635	4.0968	4.8829	4.8829	4.8829
6	English	3.8327	3.8963	3.9148	3.9870	3.9870
7	German	2.5382	2.5564	2.8891	2.9558	3.0241
8	Spanish	3.8250	4.6863	4.6745	4.7173	4.7238
9	French	4.2913	4.2876	4.0820	4.2913	4.2913

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8578	0.9465	0.9788	0.9977	1.0018
Std Dev	0.3412	0.2305	0.1847	0.1892	0.1918
Avg Mean	0.7144	0.7636	0.7474	0.7425	0.7478

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1195	1.1043	1.1422	1.1702	1.1702
1	French	1.1256	1.1412	1.1412	1.1563	1.1563
2	English	0.5421	0.5575	0.7208	0.7312	0.7312
3	German	1.1796	1.1908	1.1733	1.1936	1.1936
4	French	1.1615	1.1768	1.2063	1.2089	1.2377
5	French	0.1678	0.8056	0.8953	0.8953	0.8953
6	English	0.8742	0.9124	0.9038	0.9403	0.9403
7	German	0.6078	0.6167	0.7145	0.7223	0.7253
8	Spanish	0.7136	0.8812	0.8644	0.8721	0.8820
9	French	1.0866	1.0783	1.0258	1.0866	1.0866

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.6678	0.7458	0.7714	0.7913	0.7979
Std Dev	0.3225	0.2501	0.2388	0.2359	0.2423
Avg Mean	0.5468	0.5890	0.5705	0.5685	0.5713

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.7521	0.7519	0.7405	0.7767	0.7767
1	French	0.9093	0.9121	0.9993	0.9993	1.0447
2	English	0.3477	0.4046	0.4725	0.5119	0.5119
3	German	1.0506	1.0508	1.0812	1.0851	1.0851
4	French	0.9676	1.0207	1.0229	1.0462	1.0623
5	French	0.0980	0.4484	0.5258	0.5258	0.5258
6	English	0.6428	0.7003	0.7003	0.7401	0.7401
7	German	0.4432	0.4763	0.5173	0.5226	0.5226
8	Spanish	0.4619	0.6637	0.6329	0.6582	0.6637
9	French	1.0045	1.0291	1.0213	1.0466	1.0466

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.2582	1.4203	1.4750	1.4850	1.5013
Std Dev	0.5172	0.4006	0.3412	0.3477	0.3418
Avg Mean	1.0502	1.1239	1.1059	1.0967	1.1057

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.6427	1.6722	1.7471	1.7397	1.7471
1	French	1.7847	1.8246	1.8036	1.8339	1.8339
2	English	0.6011	0.6206	0.8301	0.7949	0.8301
3	German	1.8622	1.8656	1.8715	1.8756	1.8791
4	French	1.5389	1.5324	1.5389	1.5389	1.5389
5	French	0.3043	1.4622	1.6611	1.6611	1.6611
6	English	1.2764	1.3363	1.3348	1.3881	1.3881
7	German	0.8482	0.8471	1.0189	1.0189	1.0189
8	Spanish	1.3010	1.5888	1.5870	1.5757	1.6226
9	French	1.4229	1.4531	1.3569	1.4229	1.4931

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7316	0.8069	0.8436	0.8484	0.8696
Std Dev	0.3411	0.2717	0.2451	0.2633	0.2508
Avg Mean	0.6001	0.6379	0.6293	0.6247	0.6280

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1948	1.2023	1.2259	1.2256	1.2259
1	French	0.7994	0.7983	0.8327	0.8100	0.8544
2	English	0.3176	0.3177	0.3970	0.3843	0.3970
3	German	1.1007	1.1006	1.1086	1.1588	1.1588
4	French	0.7929	0.8113	0.8491	0.8113	0.8632
5	French	0.1659	0.7888	0.8242	0.8637	0.8637
6	English	1.0618	1.0851	1.0802	1.1144	1.1144
7	German	0.5032	0.5129	0.6322	0.5742	0.6322
8	Spanish	0.5634	0.6746	0.7005	0.7043	0.7494
9	French	0.8167	0.7778	0.7856	0.8371	0.8371

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9245	1.0314	1.0628	1.0900	1.0968
Std Dev	0.3525	0.2195	0.1794	0.1782	0.1780
Avg Mean	0.7693	0.8212	0.8064	0.8001	0.8062

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.2375	1.2006	1.2722	1.2924	1.2924
1	French	1.2089	1.2266	1.2266	1.2551	1.2551
2	English	0.5982	0.6299	0.7864	0.8460	0.8460
3	German	1.1755	1.1767	1.1561	1.1997	1.1997
4	French	1.1980	1.2235	1.2831	1.2667	1.2893
5	French	0.1907	0.9233	1.0033	1.0033	1.0033
6	English	0.9372	0.9771	0.9757	1.0136	1.0136
7	German	0.7112	0.7354	0.8388	0.8415	0.8468
8	Spanish	0.7690	0.9826	0.9350	0.9497	0.9826
9	French	1.2191	1.2387	1.1507	1.2316	1.2387

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9405	1.0561	1.1017	1.1205	1.1271
Std Dev	0.3460	0.2288	0.1716	0.1691	0.1730
Avg Mean	0.7834	0.8410	0.8255	0.8193	0.8261

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.1102	1.0841	1.1505	1.1660	1.1660
1	French	1.2789	1.2996	1.2996	1.3347	1.3347
2	English	0.5769	0.6119	0.7823	0.8180	0.8180
3	German	1.2344	1.2412	1.2337	1.2561	1.2564
4	French	1.2191	1.2360	1.2786	1.2623	1.2850
5	French	0.2084	1.0084	1.1398	1.1398	1.1398
6	English	0.9398	0.9662	1.0042	1.0325	1.0325
7	German	0.7291	0.7394	0.8584	0.8641	0.8641
8	Spanish	0.9287	1.1551	1.1216	1.1382	1.1551
9	French	1.1798	1.2196	1.1483	1.1935	1.2196

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.8097	0.8920	0.9233	0.9415	0.9450
Std Dev	0.3253	0.2195	0.1760	0.1733	0.1754
Avg Mean	0.6754	0.7206	0.7056	0.7009	0.7059

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.0458	1.0497	1.0755	1.1026	1.1026
1	French	1.0602	1.0708	1.0708	1.0760	1.0909
2	English	0.4986	0.5323	0.6918	0.7076	0.7076
3	German	1.1334	1.1367	1.1302	1.1379	1.1379
4	French	1.1034	1.1154	1.1366	1.1291	1.1409
5	French	0.1556	0.7525	0.8220	0.8220	0.8220
6	English	0.8415	0.8782	0.8718	0.9155	0.9155
7	German	0.5702	0.5700	0.6674	0.6906	0.6906
8	Spanish	0.6748	0.8247	0.8116	0.8207	0.8288
9	French	1.0134	0.9901	0.9548	1.0134	1.0134

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9601	1.0789	1.1310	1.1307	1.1498
Std Dev	0.3556	0.2437	0.1953	0.1896	0.1989
Avg Mean	0.7975	0.8569	0.8392	0.8330	0.8400

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.2292	1.2237	1.3007	1.2799	1.3007
1	French	1.3110	1.3405	1.3405	1.3405	1.3405
2	English	0.5909	0.6097	0.8132	0.8219	0.8219
3	German	1.1202	1.1382	1.1414	1.1530	1.1530
4	French	1.2221	1.2307	1.2634	1.2634	1.3278
5	French	0.2259	1.0865	1.2146	1.2146	1.2146
6	English	1.1729	1.2011	1.2346	1.1959	1.2346
7	German	0.6567	0.6622	0.7654	0.7672	0.7689
8	Spanish	0.9263	1.1515	1.1317	1.1242	1.1518
9	French	1.1462	1.1444	1.1044	1.1462	1.1840

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.9141	1.0181	1.0645	1.0716	1.0897
Std Dev	0.3525	0.2444	0.1981	0.1935	0.2040
Avg Mean	0.7530	0.8110	0.7932	0.7873	0.7938

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.0680	1.0384	1.1233	1.0920	1.1233
1	French	1.3144	1.3057	1.2605	1.2766	1.3144
2	English	0.5464	0.5531	0.7303	0.7446	0.7446
3	German	1.1596	1.1531	1.1710	1.1665	1.1763
4	French	1.2114	1.2192	1.2680	1.2438	1.2680
5	French	0.2241	1.0629	1.1896	1.1896	1.1896
6	English	1.0477	1.0780	1.0791	1.1309	1.1309
7	German	0.5975	0.6191	0.7089	0.7196	0.7196
8	Spanish	0.8344	1.0116	1.0107	1.0143	1.0300
9	French	1.1378	1.1401	1.1033	1.1378	1.2003

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.6865	0.7797	0.7999	0.8265	0.8493
Std Dev	0.3184	0.2540	0.2218	0.2110	0.2365
Avg Mean	0.5611	0.6082	0.5920	0.5884	0.5931

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.5042	0.5158	0.5308	0.5636	0.5636
1	French	1.0793	1.0714	1.1328	1.1328	1.1968
2	English	0.3353	0.4223	0.4961	0.5452	0.5452
3	German	0.9399	0.9987	0.9987	0.9915	1.0523
4	French	0.9445	1.0088	0.9445	1.0088	1.0257
5	French	0.1246	0.6050	0.7106	0.7106	0.7106
6	English	0.6721	0.7151	0.7101	0.7898	0.7898
7	German	0.5601	0.5672	0.6213	0.6540	0.6540
8	Spanish	0.6478	0.7696	0.8124	0.8106	0.8322
9	French	1.0571	1.1232	1.0420	1.0576	1.1232

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.6046	0.6918	0.7222	0.7402	0.7547
Std Dev	0.2713	0.2118	0.2069	0.1993	0.2144
Avg Mean	0.4936	0.5373	0.5238	0.5202	0.5253

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.4689	0.4771	0.5077	0.5184	0.5184
1	French	0.9485	0.9631	1.0459	1.0459	1.1041
2	English	0.3231	0.4319	0.5017	0.5160	0.5160
3	German	0.8994	0.9610	0.9610	0.9877	0.9877
4	French	0.7493	0.8390	0.7823	0.8481	0.8738
5	French	0.1199	0.5811	0.6512	0.6512	0.6512
6	English	0.5162	0.5560	0.5574	0.6228	0.6228
7	German	0.4686	0.4701	0.4908	0.5243	0.5243
8	Spanish	0.6786	0.7493	0.8443	0.8149	0.8590
9	French	0.8732	0.8898	0.8794	0.8732	0.8898

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.5712	0.6338	0.6486	0.6738	0.6775
Std Dev	0.2654	0.2099	0.1875	0.1898	0.1922
Avg Mean	0.4710	0.5010	0.4847	0.4840	0.4869

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	0.8187	0.8392	0.7923	0.8449	0.8449
1	French	0.6998	0.7245	0.7640	0.7640	0.7969
2	English	0.3383	0.3970	0.4577	0.5254	0.5254
3	German	0.8678	0.8678	0.8736	0.8964	0.8964
4	French	0.7916	0.8587	0.8389	0.8784	0.8815
5	French	0.0688	0.3298	0.3682	0.3682	0.3682
6	English	0.5726	0.6132	0.6256	0.6504	0.6504
7	German	0.3901	0.3964	0.4588	0.4584	0.4588
8	Spanish	0.3822	0.5239	0.5052	0.5474	0.5474
9	French	0.7822	0.7872	0.8016	0.8049	0.8049

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6291	1.8186	1.8798	1.9103	1.9195
Std Dev	0.6080	0.4153	0.3251	0.3118	0.3149
Avg Mean	1.3579	1.4514	1.4249	1.4155	1.4251

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0727	2.0516	2.1587	2.1570	2.1587
1	French	2.1453	2.1363	2.1517	2.1581	2.1581
2	English	0.9398	0.9151	1.1917	1.2604	1.2604
3	German	2.1463	2.1568	2.1518	2.1776	2.1776
4	French	2.0452	2.1136	2.1586	2.1147	2.1586
5	French	0.3628	1.7799	1.9490	1.9490	1.9490
6	English	1.6935	1.7594	1.7594	1.8424	1.8424
7	German	1.2761	1.2796	1.4900	1.4900	1.4942
8	Spanish	1.5271	1.8963	1.8559	1.8640	1.8981
9	French	2.0823	2.0977	1.9308	2.0902	2.0977

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6832	1.8913	1.9613	1.9899	1.9996
Std Dev	0.6369	0.4517	0.3553	0.3491	0.3538
Avg Mean	1.3981	1.5046	1.4762	1.4664	1.4776

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.8825	1.8513	1.9756	1.9565	1.9756
1	French	2.3525	2.3542	2.3615	2.3857	2.3857
2	English	0.9429	0.9609	1.2536	1.3073	1.3073
3	German	2.2178	2.2171	2.2153	2.2496	2.2496
4	French	2.2110	2.2467	2.2774	2.2444	2.2774
5	French	0.3976	1.9374	2.1557	2.1557	2.1557
6	English	1.7025	1.7518	1.7646	1.8343	1.8343
7	German	1.2646	1.2909	1.4969	1.4950	1.4969
8	Spanish	1.7063	2.1008	2.0683	2.1015	2.1124
9	French	2.1543	2.2016	2.0443	2.1690	2.2016

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6093	1.7896	1.8517	1.8897	1.8934
Std Dev	0.6226	0.4050	0.3213	0.3086	0.3125
Avg Mean	1.3334	1.4292	1.4000	1.3909	1.4017

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0551	2.0254	2.1096	2.1381	2.1381
1	French	2.1536	2.1758	2.1758	2.1941	2.1941
2	English	0.9734	1.0393	1.3434	1.3864	1.3864
3	German	2.0874	2.0829	2.0547	2.0935	2.0935
4	French	2.1538	2.1594	2.2287	2.2064	2.2405
5	French	0.3494	1.6953	1.8563	1.8563	1.8563
6	English	1.7036	1.7410	1.7122	1.7939	1.7939
7	German	1.1109	1.1450	1.3212	1.3771	1.3771
8	Spanish	1.4541	1.7996	1.7722	1.7988	1.8021
9	French	2.0519	2.0325	1.9430	2.0519	2.0519

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6379	1.8253	1.9079	1.9154	1.9357
Std Dev	0.6148	0.4403	0.3590	0.3562	0.3609
Avg Mean	1.3553	1.4549	1.4256	1.4162	1.4266

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.1003	2.1003	2.2189	2.1747	2.2189
1	French	2.2520	2.2560	2.2560	2.2560	2.2573
2	English	0.9066	0.8806	1.1945	1.1769	1.1945
3	German	2.0319	2.0390	2.0693	2.0693	2.0693
4	French	2.0438	2.0554	2.1211	2.0741	2.1539
5	French	0.3928	1.9088	2.1245	2.1245	2.1245
6	English	1.9189	1.9542	1.9603	2.0076	2.0076
7	German	1.1688	1.1635	1.3594	1.3594	1.3629
8	Spanish	1.6018	1.9741	1.9580	1.9491	1.9909
9	French	1.9623	1.9208	1.8171	1.9623	1.9769

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6227	1.8137	1.8707	1.9040	1.9110
Std Dev	0.6338	0.4351	0.3488	0.3464	0.3495
Avg Mean	1.3515	1.4499	1.4210	1.4121	1.4221

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0521	2.0285	2.1049	2.1275	2.1275
1	French	2.2797	2.2929	2.2929	2.3011	2.3011
2	English	0.9320	0.9520	1.2422	1.2692	1.2692
3	German	2.1127	2.1286	2.0941	2.1396	2.1396
4	French	2.1043	2.1308	2.2110	2.1680	2.2110
5	French	0.3648	1.7720	1.9389	1.9389	1.9389
6	English	1.6790	1.7274	1.7274	1.8241	1.8241
7	German	1.1466	1.1816	1.3573	1.3670	1.3670
8	Spanish	1.4695	1.8327	1.7913	1.8047	1.8327
9	French	2.0864	2.0903	1.9472	2.0994	2.0994

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.5354	1.7098	1.7688	1.7972	1.8013
Std Dev	0.5916	0.4193	0.3370	0.3345	0.3334
Avg Mean	1.2830	1.3725	1.3460	1.3376	1.3464

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.9702	1.9651	2.0625	2.0661	2.0661
1	French	2.0386	2.0441	2.0573	2.0598	2.0598
2	English	0.8733	0.8413	1.1193	1.1351	1.1351
3	German	2.1652	2.1834	2.1676	2.1864	2.1864
4	French	1.9097	1.9656	2.0153	2.0034	2.0153
5	French	0.3372	1.6433	1.8096	1.8096	1.8096
6	English	1.6043	1.6463	1.6331	1.7261	1.7261
7	German	1.1345	1.1494	1.3330	1.3330	1.3443
8	Spanish	1.4368	1.7590	1.7458	1.7529	1.7695
9	French	1.8838	1.9009	1.7442	1.8999	1.9009

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6384	1.8358	1.9058	1.9259	1.9414
Std Dev	0.6687	0.4751	0.4045	0.3996	0.4093
Avg Mean	1.3570	1.4600	1.4272	1.4173	1.4276

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.6948	1.7089	1.7841	1.8101	1.8101
1	French	2.3612	2.3268	2.4076	2.3951	2.4527
2	English	0.8562	0.9145	1.1865	1.1870	1.1870
3	German	2.2969	2.3188	2.3448	2.3448	2.3549
4	French	2.1632	2.2464	2.2989	2.2786	2.3003
5	French	0.3583	1.7506	1.9350	1.9350	1.9350
6	English	1.6269	1.7462	1.7045	1.7911	1.7911
7	German	1.2255	1.2429	1.3957	1.4453	1.4453
8	Spanish	1.5602	1.8737	1.8971	1.8309	1.8971
9	French	2.2408	2.2295	2.1041	2.2408	2.2408

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.6024	1.8099	1.8977	1.9041	1.9280
Std Dev	0.6573	0.5061	0.4549	0.4405	0.4600
Avg Mean	1.3129	1.4173	1.3900	1.3784	1.3898

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.4668	1.4293	1.5561	1.5558	1.5561
1	French	2.4247	2.4184	2.4913	2.4283	2.5349
2	English	0.8353	0.8968	1.1359	1.1275	1.1359
3	German	2.3880	2.4561	2.4757	2.4860	2.4914
4	French	1.9562	2.0757	2.1254	2.0757	2.1254
5	French	0.3954	1.9223	2.0922	2.0922	2.0922
6	English	1.5027	1.6135	1.6135	1.7000	1.7000
7	German	1.2232	1.2393	1.3968	1.4144	1.4144
8	Spanish	1.7489	2.0128	2.1290	2.0783	2.1461
9	French	2.0831	2.0348	1.9611	2.0831	2.0831

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.4980	1.6583	1.7254	1.7480	1.7686
Std Dev	0.6035	0.4475	0.3630	0.3722	0.3740
Avg Mean	1.2447	1.3336	1.3066	1.2987	1.3075

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.9541	1.9454	2.0261	2.0680	2.0680
1	French	1.9884	2.0251	2.0264	2.0264	2.0895
2	English	0.8184	0.7697	1.0209	1.0489	1.0489
3	German	2.0171	2.0510	2.0893	2.0641	2.0893
4	French	1.9795	2.0700	2.0562	2.0887	2.1237
5	French	0.3070	1.4988	1.6613	1.6613	1.6613
6	English	1.6094	1.6094	1.6682	1.7287	1.7287
7	German	1.0425	1.0444	1.2497	1.1923	1.2497
8	Spanish	1.3218	1.6518	1.5991	1.6594	1.6852
9	French	1.9419	1.9171	1.8572	1.9419	1.9419

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.5510	1.7345	1.8091	1.8207	1.8629
Std Dev	0.5825	0.4153	0.3175	0.3181	0.3351
Avg Mean	1.2740	1.3556	1.3330	1.3252	1.3340

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.9827	2.0572	2.0572	2.0621	2.0621
1	French	2.0235	2.0021	2.0021	2.0021	2.1373
2	English	0.8092	0.8225	1.1438	1.1028	1.1438
3	German	1.9407	1.9619	1.9888	1.9619	2.0188
4	French	1.9734	1.9927	2.1350	1.9925	2.1509
5	French	0.3588	1.6840	1.8924	1.9407	1.9407
6	English	1.7205	1.7957	1.7525	1.8148	1.8207
7	German	1.1635	1.1604	1.3597	1.3820	1.3820
8	Spanish	1.5839	2.0063	1.9304	1.9938	2.0184
9	French	1.9541	1.8626	1.8286	1.9541	1.9541

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.4562	1.6334	1.6862	1.7129	1.7295
Std Dev	0.5568	0.3881	0.3136	0.3103	0.3237
Avg Mean	1.2152	1.2975	1.2696	1.2609	1.2708

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.7601	1.7635	1.8438	1.9148	1.9148
1	French	1.9580	2.0140	2.0440	2.0440	2.0877
2	English	0.8341	0.8534	1.1086	1.1041	1.1086
3	German	1.9933	2.0247	2.0218	2.0133	2.0537
4	French	1.8444	1.9298	1.9589	1.9117	1.9713
5	French	0.3026	1.5087	1.6523	1.6523	1.6523
6	English	1.4453	1.5188	1.5103	1.5855	1.5855
7	German	1.1276	1.1279	1.2679	1.3026	1.3026
8	Spanish	1.4169	1.7389	1.7025	1.7210	1.7389
9	French	1.8794	1.8542	1.7517	1.8794	1.8794

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.5747	1.7640	1.8339	1.8643	1.8857
Std Dev	0.6634	0.4736	0.4052	0.4185	0.4148
Avg Mean	1.3063	1.3998	1.3689	1.3601	1.3679

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.0071	1.9769	2.0496	2.0496	2.0648
1	French	2.0524	1.9854	2.1039	2.1039	2.1665
2	English	0.8100	0.8181	1.1035	1.0783	1.1035
3	German	2.2472	2.2402	2.2904	2.3111	2.3111
4	French	2.1160	2.2734	2.3291	2.3291	2.3291
5	French	0.3379	1.6241	1.7747	1.7921	1.7921
6	English	1.5238	1.6897	1.6237	1.7048	1.7530
7	German	1.0790	1.1274	1.3161	1.3248	1.3248
8	Spanish	1.3700	1.8080	1.7130	1.7451	1.8080
9	French	2.2038	2.0969	2.0352	2.2038	2.2038

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.0507	1.1807	1.2371	1.2467	1.2569
Std Dev	0.4170	0.2947	0.2464	0.2569	0.2516
Avg Mean	0.8718	0.9292	0.9176	0.9094	0.9166

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.4841	1.4532	1.5517	1.5681	1.5681
1	French	1.4061	1.4510	1.4510	1.4510	1.4514
2	English	0.5612	0.5944	0.7916	0.7633	0.7916
3	German	1.4112	1.3926	1.3812	1.3973	1.4112
4	French	1.2528	1.2476	1.2528	1.2528	1.2528
5	French	0.2587	1.2370	1.4143	1.4143	1.4143
6	English	1.1883	1.2423	1.2546	1.2964	1.2964
7	German	0.6933	0.7077	0.8530	0.8530	0.8530
8	Spanish	0.9591	1.2018	1.1831	1.1783	1.2158
9	French	1.2925	1.2797	1.2373	1.2925	1.3148

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	0.7210	0.8015	0.8351	0.8340	0.8599
Std Dev	0.3382	0.2701	0.2511	0.2633	0.2540
Avg Mean	0.5833	0.6207	0.6144	0.6093	0.6126

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	1.2427	1.2207	1.2508	1.2726	1.2726
1	French	0.8132	0.8168	0.8185	0.8142	0.8443
2	English	0.3167	0.3229	0.3946	0.3738	0.3946
3	German	1.0059	1.0059	1.0218	1.0087	1.0545
4	French	0.7387	0.7494	0.7543	0.7494	0.8121
5	French	0.1730	0.8228	0.8774	0.8983	0.8983
6	English	1.0897	1.1302	1.1430	1.1430	1.1510
7	German	0.5155	0.5104	0.6380	0.5744	0.6380
8	Spanish	0.5646	0.7145	0.7284	0.7256	0.7531
9	French	0.7502	0.7212	0.7243	0.7803	0.7803

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.7781	1.9943	2.0676	2.0926	2.1024
Std Dev	0.7034	0.5020	0.4095	0.4193	0.4152
Avg Mean	1.4779	1.5784	1.5535	1.5420	1.5532

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.4311	2.4023	2.5125	2.5265	2.5265
1	French	2.4019	2.4348	2.4348	2.4348	2.4348
2	English	0.9111	0.9375	1.2582	1.2379	1.2582
3	German	2.3902	2.3837	2.3624	2.3877	2.3903
4	French	2.1664	2.1809	2.1879	2.1664	2.1879
5	French	0.4257	2.0618	2.3068	2.3068	2.3068
6	English	1.9406	2.0147	2.0267	2.1296	2.1296
7	German	1.2215	1.2507	1.4666	1.4666	1.4666
8	Spanish	1.6835	2.0902	2.0623	2.0603	2.1141
9	French	2.2091	2.1869	2.0582	2.2091	2.2091

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

	N=5	N=10	N=25	N=50	N=100
Avg Best	1.5440	1.7067	1.7908	1.7898	1.8203
Std Dev	0.6416	0.5003	0.4445	0.4564	0.4510
Avg Mean	1.2747	1.3525	1.3344	1.3243	1.3333

Per-Prompt Best Reward by N

#	Lang	N=5	N=10	N=25	N=50	N=100
0	English	2.2525	2.2736	2.3386	2.3802	2.3802
1	French	1.9994	2.0276	2.0112	1.9730	2.0276
2	English	0.6968	0.6609	0.8407	0.8349	0.8407
3	German	2.2097	2.2192	2.2531	2.2524	2.2531
4	French	1.7903	1.8595	1.9609	1.8388	1.9651
5	French	0.3583	1.7593	1.8678	1.9718	1.9718
6	English	1.9158	1.9608	1.9931	2.0007	2.0080
7	German	1.1425	1.1106	1.3516	1.2785	1.3516
8	Spanish	1.3532	1.5629	1.6131	1.6459	1.6833
9	French	1.7219	1.6331	1.6775	1.7219	1.7219

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100

DramaBox Prompt

A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100

DramaBox Prompt

High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.

[Prompts 0-1] · Prompts 2-3 · Prompts 4-5 · Prompts 6-7 · Prompts 8-9