Scenema Audio — Best-of-N: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#KeyScore FormulaText Prompt(s)
0standardContent Enjoyment
1clap_lqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2clap_sqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3clap_lpcos(audio, prompt)Original DramaBox prompt
4clap_spcos(audio, prompt)Original DramaBox prompt
5v1_nat_Lcos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
6v2_auth_Lcos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
7v3_pro_Lcos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
8v4_expr_Lcos(audio, expr)"expressive, dynamic voice acting with rich emotional range"
9v5_cine_Lcos(audio, cine)"immersive cinematic narration, compelling storytelling"
10v6_nat_Scos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
11v7_auth_Scos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
12v8_pro_Scos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
13v9_nr_Lcos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14v10_ac_Lcos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15v11_pd_Lcos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16v12_ef_Lcos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17v13_ff_Lcos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18v14_wr_Lcos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19v15_nr_Scos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20v16_ac_Scos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21v17_pd_Scos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22v18_ef_Scos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23v19_ff_Scos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24v20_wr_Scos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25v21_san_Lcos(audio, sanitized_prompt)Quoted speech removed (Large)
26v22_san_Scos(audio, sanitized_prompt)Quoted speech removed (Small)
27v23_snr_Lcos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Large)
28v24_snr_Scos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method N=5 N=10 N=25 N=50 N=100 Gain N=5→100Knee Point
Standard: (1−WER) × Content Enjoyment 4.0428 4.1408 4.3040 4.3142 4.3628 +0.3199 N=50
VoiceCLAP-Large × Quality Text 0.9382 0.9669 0.9912 1.0029 1.0105 +0.0723 N=50
VoiceCLAP-Small × Quality Text 0.7145 0.7557 0.7905 0.7989 0.8163 +0.1018 N=50
VoiceCLAP-Large × Prompt Match 1.3999 1.4481 1.4806 1.4939 1.5071 +0.1072 N=50
VoiceCLAP-Small × Prompt Match 0.7935 0.8272 0.8358 0.8533 0.8745 +0.0811 N=25
v1 Natural (Large) 1.0182 1.0507 1.0753 1.0955 1.1007 +0.0825 N=50
v2 Authentic (Large) 1.0417 1.0757 1.1070 1.1228 1.1283 +0.0866 N=50
v3 Professional (Large) 0.8839 0.9104 0.9344 0.9464 0.9553 +0.0714 N=50
v4 Expressive (Large) 1.0674 1.0989 1.1503 1.1399 1.1609 +0.0936 N=50
v5 Cinematic (Large) 1.0103 1.0374 1.0845 1.0736 1.0972 +0.0869 N=50
v6 Natural (Small) 0.7287 0.7761 0.8128 0.8190 0.8507 +0.1220 N=50
v7 Authentic (Small) 0.6651 0.7163 0.7486 0.7537 0.7703 +0.1053 N=50
v8 Professional (Small) 0.6109 0.6477 0.6647 0.6831 0.7008 +0.0899 N=100
v9 Natural−Robotic (Large) 1.7977 1.8396 1.8937 1.9131 1.9235 +0.1258 N=50
v10 Authentic−Cheap (Large) 1.8655 1.9179 1.9764 1.9911 2.0030 +0.1375 N=50
v11 Professional−Distorted (Large) 1.7750 1.8247 1.8765 1.8977 1.9133 +0.1383 N=50
v12 Expressive−Flat (Large) 1.8149 1.8563 1.9296 1.9202 1.9482 +0.1333 N=50
v13 FullPos−FullNeg (Large) 1.7943 1.8421 1.8963 1.9123 1.9265 +0.1321 N=50
v14 Warm−Robotic (Large) 1.6974 1.7359 1.7887 1.8043 1.8119 +0.1145 N=50
v15 Natural−Robotic (Small) 1.8026 1.8541 1.9326 1.9311 1.9696 +0.1670 N=50
v16 Authentic−Cheap (Small) 1.7817 1.8385 1.9087 1.9118 1.9354 +0.1537 N=50
v17 Professional−Distorted (Small) 1.6513 1.7063 1.7464 1.7580 1.7837 +0.1324 N=50
v18 Expressive−Flat (Small) 1.7013 1.7937 1.8276 1.8426 1.8982 +0.1969 N=50
v19 FullPos−FullNeg (Small) 1.6088 1.6750 1.7174 1.7289 1.7616 +0.1528 N=50
v20 Warm−Robotic (Small) 1.7561 1.7931 1.8711 1.8821 1.9198 +0.1637 N=50
v21 Sanitized Prompt (Large) 1.1625 1.1956 1.2300 1.2426 1.2543 +0.0918 N=50
v22 Sanitized Prompt (Small) 0.7789 0.8160 0.8245 0.8353 0.8623 +0.0834 N=25
v23 Sanitized−Uncanny (Large) 1.9633 2.0168 2.0701 2.0903 2.0995 +0.1362 N=50
v24 Sanitized−Uncanny (Small) 1.6995 1.7653 1.7926 1.8202 1.8531 +0.1536 N=50

Diminishing Returns — All Methods Overlaid

Original (5) 0.643 1.431 2.218 3.006 3.793 4.581 N=5 N=10 N=25 N=50 N=100 N candidates Standard: (1−WER) × Content Enjoyment VoiceCLAP-Large × Quality Text VoiceCLAP-Small × Quality Text VoiceCLAP-Large × Prompt Match VoiceCLAP-Small × Prompt Match Positive-only, Large (5) 0.795 0.880 0.965 1.050 1.134 1.219 N=5 N=10 N=25 N=50 N=100 N candidates v1 Natural (Large) v2 Authentic (Large) v3 Professional (Large) v4 Expressive (Large) v5 Cinematic (Large) Positive-only, Small (3) 0.550 0.618 0.687 0.756 0.825 0.893 N=5 N=10 N=25 N=50 N=100 N candidates v6 Natural (Small) v7 Authentic (Small) v8 Professional (Small) Pos−Neg, Large (6) 1.528 1.643 1.758 1.873 1.988 2.103 N=5 N=10 N=25 N=50 N=100 N candidates v9 Natural−Robotic (Large) v10 Authentic−Cheap (Large) v11 Professional−Distorted (Large) v12 Expressive−Flat (Large) v13 FullPos−FullNeg (Large) v14 Warm−Robotic (Large) Pos−Neg, Small (6) 1.448 1.572 1.696 1.820 1.944 2.068 N=5 N=10 N=25 N=50 N=100 N candidates v15 Natural−Robotic (Small) v16 Authentic−Cheap (Small) v17 Professional−Distorted (Small) v18 Expressive−Flat (Small) v19 FullPos−FullNeg (Small) v20 Warm−Robotic (Small) Sanitized Prompt (4) 0.701 1.002 1.302 1.603 1.904 2.204 N=5 N=10 N=25 N=50 N=100 N candidates v21 Sanitized Prompt (Large) v22 Sanitized Prompt (Small) v23 Sanitized−Uncanny (Large) v24 Sanitized−Uncanny (Small)

Marginal Improvement per Additional Candidate

MethodN=5→10N=10→25N=25→50N=50→100
Standard: (1−WER) × Content Enjoyment 0.01959/cand (2.4%) 0.01088/cand (3.9%) 0.00041/cand (0.2%) 0.00097/cand (1.1%)
VoiceCLAP-Large × Quality Text 0.00572/cand (3.1%) 0.00162/cand (2.5%) 0.00047/cand (1.2%) 0.00015/cand (0.8%)
VoiceCLAP-Small × Quality Text 0.00823/cand (5.8%) 0.00232/cand (4.6%) 0.00033/cand (1.1%) 0.00035/cand (2.2%)
VoiceCLAP-Large × Prompt Match 0.00965/cand (3.4%) 0.00216/cand (2.2%) 0.00053/cand (0.9%) 0.00026/cand (0.9%)
VoiceCLAP-Small × Prompt Match 0.00676/cand (4.3%) 0.00057/cand (1.0%) 0.00070/cand (2.1%) 0.00043/cand (2.5%)
v1 Natural (Large) 0.00650/cand (3.2%) 0.00164/cand (2.3%) 0.00081/cand (1.9%) 0.00010/cand (0.5%)
v2 Authentic (Large) 0.00679/cand (3.3%) 0.00209/cand (2.9%) 0.00063/cand (1.4%) 0.00011/cand (0.5%)
v3 Professional (Large) 0.00531/cand (3.0%) 0.00160/cand (2.6%) 0.00048/cand (1.3%) 0.00018/cand (0.9%)
v4 Expressive (Large) 0.00630/cand (3.0%) 0.00343/cand (4.7%) -0.00042/cand (-0.9%) 0.00042/cand (1.8%)
v5 Cinematic (Large) 0.00543/cand (2.7%) 0.00314/cand (4.5%) -0.00044/cand (-1.0%) 0.00047/cand (2.2%)
v6 Natural (Small) 0.00948/cand (6.5%) 0.00244/cand (4.7%) 0.00025/cand (0.8%) 0.00063/cand (3.9%)
v7 Authentic (Small) 0.01026/cand (7.7%) 0.00215/cand (4.5%) 0.00020/cand (0.7%) 0.00033/cand (2.2%)
v8 Professional (Small) 0.00736/cand (6.0%) 0.00113/cand (2.6%) 0.00074/cand (2.8%) 0.00035/cand (2.6%)
v9 Natural−Robotic (Large) 0.00837/cand (2.3%) 0.00361/cand (2.9%) 0.00077/cand (1.0%) 0.00021/cand (0.5%)
v10 Authentic−Cheap (Large) 0.01047/cand (2.8%) 0.00390/cand (3.0%) 0.00059/cand (0.7%) 0.00024/cand (0.6%)
v11 Professional−Distorted (Large) 0.00994/cand (2.8%) 0.00346/cand (2.8%) 0.00085/cand (1.1%) 0.00031/cand (0.8%)
v12 Expressive−Flat (Large) 0.00828/cand (2.3%) 0.00489/cand (3.9%) -0.00037/cand (-0.5%) 0.00056/cand (1.5%)
v13 FullPos−FullNeg (Large) 0.00956/cand (2.7%) 0.00361/cand (2.9%) 0.00064/cand (0.8%) 0.00028/cand (0.7%)
v14 Warm−Robotic (Large) 0.00770/cand (2.3%) 0.00352/cand (3.0%) 0.00062/cand (0.9%) 0.00015/cand (0.4%)
v15 Natural−Robotic (Small) 0.01031/cand (2.9%) 0.00523/cand (4.2%) -0.00006/cand (-0.1%) 0.00077/cand (2.0%)
v16 Authentic−Cheap (Small) 0.01137/cand (3.2%) 0.00468/cand (3.8%) 0.00012/cand (0.2%) 0.00047/cand (1.2%)
v17 Professional−Distorted (Small) 0.01100/cand (3.3%) 0.00267/cand (2.4%) 0.00046/cand (0.7%) 0.00051/cand (1.5%)
v18 Expressive−Flat (Small) 0.01849/cand (5.4%) 0.00226/cand (1.9%) 0.00060/cand (0.8%) 0.00111/cand (3.0%)
v19 FullPos−FullNeg (Small) 0.01325/cand (4.1%) 0.00283/cand (2.5%) 0.00046/cand (0.7%) 0.00065/cand (1.9%)
v20 Warm−Robotic (Small) 0.00740/cand (2.1%) 0.00520/cand (4.3%) 0.00044/cand (0.6%) 0.00075/cand (2.0%)
v21 Sanitized Prompt (Large) 0.00663/cand (2.9%) 0.00229/cand (2.9%) 0.00050/cand (1.0%) 0.00023/cand (0.9%)
v22 Sanitized Prompt (Small) 0.00742/cand (4.8%) 0.00056/cand (1.0%) 0.00043/cand (1.3%) 0.00054/cand (3.2%)
v23 Sanitized−Uncanny (Large) 0.01069/cand (2.7%) 0.00355/cand (2.6%) 0.00081/cand (1.0%) 0.00018/cand (0.4%)
v24 Sanitized−Uncanny (Small) 0.01315/cand (3.9%) 0.00182/cand (1.5%) 0.00110/cand (1.5%) 0.00066/cand (1.8%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking MethodWithout Suffix (N=10)With Suffix (N=10)Delta
MeanBestMedianMeanBestMedianΔ MeanΔ Best
Standard: (1−WER) × Content Enjoyment3.36154.18473.61753.37633.99553.5269+0.0149-0.1891
VoiceCLAP-Large × Quality Text0.78380.96950.83480.78410.91430.8175+0.0003-0.0552
VoiceCLAP-Small × Quality Text0.57740.74120.60030.58100.71990.6050+0.0036-0.0213
VoiceCLAP-Large × Prompt Match1.16131.44241.24901.17141.37651.2262+0.0101-0.0659
VoiceCLAP-Small × Prompt Match0.64950.81310.68290.65310.78780.6749+0.0036-0.0253
v1 Natural (Large)0.84591.05970.90600.84670.99540.8844+0.0008-0.0643
v2 Authentic (Large)0.86741.08610.93460.87351.02700.9118+0.0061-0.0591
v3 Professional (Large)0.73920.91440.78670.73820.86050.7725-0.0010-0.0539
v4 Expressive (Large)0.88721.10270.95270.89111.04740.9281+0.0039-0.0553
v5 Cinematic (Large)0.83821.03990.90020.84170.99070.8810+0.0034-0.0492
v6 Natural (Small)0.60900.77850.64310.62350.74490.6467+0.0144-0.0336
v7 Authentic (Small)0.55640.72020.58760.57140.68610.5949+0.0151-0.0341
v8 Professional (Small)0.48900.62970.50650.48770.60100.5041-0.0013-0.0286
v9 Natural−Robotic (Large)1.49631.85911.61071.50321.75971.5698+0.0069-0.0994
v10 Authentic−Cheap (Large)1.55581.93591.68131.56841.84361.6392+0.0126-0.0923
v11 Professional−Distorted (Large)1.47691.83861.58081.48001.73281.5483+0.0031-0.1057
v12 Expressive−Flat (Large)1.50671.86521.62541.51341.77951.5797+0.0067-0.0857
v13 FullPos−FullNeg (Large)1.49751.85391.61581.50391.75641.5741+0.0064-0.0975
v14 Warm−Robotic (Large)1.41521.74661.52361.42261.66001.4859+0.0074-0.0866
v15 Natural−Robotic (Small)1.50351.87461.61531.51821.78631.5854+0.0147-0.0883
v16 Authentic−Cheap (Small)1.48221.86901.59561.50711.78421.5690+0.0249-0.0848
v17 Professional−Distorted (Small)1.37121.70321.46571.37741.61521.4430+0.0062-0.0880
v18 Expressive−Flat (Small)1.41421.78841.50181.42041.71381.4643+0.0063-0.0745
v19 FullPos−FullNeg (Small)1.33651.67341.43191.34461.58831.4059+0.0081-0.0851
v20 Warm−Robotic (Small)1.44821.81821.54641.44531.73341.5037-0.0029-0.0849
v21 Sanitized Prompt (Large)0.96171.21091.03140.96311.14601.0088+0.0014-0.0649
v22 Sanitized Prompt (Small)0.62980.79580.66050.63250.76260.6558+0.0026-0.0332
v23 Sanitized−Uncanny (Large)1.62942.03281.75491.63491.93041.7133+0.0055-0.1024
v24 Sanitized−Uncanny (Small)1.40031.76551.49911.40421.67091.4626+0.0039-0.0946

Per-Prompt Ablation: Standard Reward (N=10)

#LangNo Suffix MeanNo Suffix BestWith Suffix MeanWith Suffix BestΔ MeanΔ Best
0English3.64614.55303.49013.7667-0.1560-0.7863
1French4.05155.13464.71635.0872+0.6649-0.0474
2English1.53492.65751.27021.3610-0.2647-1.2965
3German4.92595.01584.95455.0567+0.0286+0.0409
4French4.33124.55593.98564.5478-0.3456-0.0081
5French3.36354.56062.33234.4187-1.0312-0.1419
6English3.27413.88983.14733.7320-0.1268-0.1578
7German2.14342.93052.45122.9967+0.3079+0.0662
8Spanish2.83684.57193.72544.6909+0.8886+0.1190
9French3.50763.97733.69064.2977+0.1829+0.3204

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 4.0428 4.1408 4.3040 4.3142 4.3628
Std Dev 0.9987 1.0256 0.8478 0.8574 0.8669
Avg Mean 3.3009 3.4083 3.3088 3.3431 3.3545

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 4.6092 4.4613 4.7767 4.8450 4.8450
1French 4.8753 4.9405 4.9800 4.9405 5.1346
2English 2.0074 2.0326 2.7127 2.6575 2.7127
3German 4.9731 5.0104 5.0779 5.0760 5.0958
4French 4.4928 4.5622 4.7721 4.6559 4.7721
5French 3.9766 4.7602 4.8669 4.8669 4.8669
6English 3.8362 3.9019 3.9019 3.9869 3.9869
7German 2.5906 2.5978 2.9544 2.9718 3.0040
8Spanish 4.7735 4.8485 4.6745 4.8485 4.8772
9French 4.2935 4.2923 4.3229 4.2935 4.3324

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9382 0.9669 0.9912 1.0029 1.0105
Std Dev 0.2386 0.2322 0.1934 0.1895 0.1935
Avg Mean 0.7730 0.8039 0.7775 0.7847 0.7867

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1147 1.1060 1.1467 1.1844 1.1862
1French 1.1369 1.1469 1.1469 1.1469 1.1535
2English 0.5416 0.5584 0.7194 0.7339 0.7339
3German 1.1707 1.1889 1.1718 1.1918 1.1918
4French 1.1962 1.2330 1.2631 1.2276 1.2631
5French 0.7821 0.9345 0.9020 0.9020 0.9345
6English 0.8798 0.9053 0.9172 0.9319 0.9319
7German 0.6082 0.6142 0.7200 0.7231 0.7231
8Spanish 0.8656 0.9006 0.8644 0.9006 0.9006
9French 1.0866 1.0808 1.0607 1.0866 1.0866

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7145 0.7557 0.7905 0.7989 0.8163
Std Dev 0.2503 0.2709 0.2397 0.2265 0.2350
Avg Mean 0.5870 0.6080 0.5867 0.5937 0.5937

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.8147 0.7474 0.7407 0.8147 0.8147
1French 0.8488 0.9494 0.9430 0.9430 1.0170
2English 0.3480 0.3979 0.4652 0.5123 0.5123
3German 1.0229 1.0576 1.0635 1.0671 1.0702
4French 0.9377 1.1231 1.1231 1.1062 1.1231
5French 0.4575 0.4484 0.5350 0.5576 0.5576
6English 0.6366 0.7100 0.8218 0.7449 0.8218
7German 0.4228 0.4753 0.5481 0.5481 0.5481
8Spanish 0.6364 0.6185 0.6427 0.6692 0.6692
9French 1.0197 1.0292 1.0223 1.0258 1.0292

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.3999 1.4481 1.4806 1.4939 1.5071
Std Dev 0.4026 0.4093 0.3382 0.3519 0.3453
Avg Mean 1.1457 1.1872 1.1505 1.1642 1.1670

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.6556 1.6680 1.7490 1.7484 1.7490
1French 1.8017 1.8237 1.8017 1.8334 1.8334
2English 0.6004 0.6204 0.8337 0.7960 0.8337
3German 1.8638 1.8633 1.8698 1.8768 1.8768
4French 1.5106 1.5326 1.5106 1.5326 1.5465
5French 1.4225 1.6769 1.6505 1.6505 1.6850
6English 1.2575 1.3315 1.3354 1.3846 1.3846
7German 0.8502 0.8613 1.0097 1.0097 1.0097
8Spanish 1.6026 1.6564 1.5870 1.6564 1.6564
9French 1.4338 1.4472 1.4584 1.4508 1.4959

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7935 0.8272 0.8358 0.8533 0.8745
Std Dev 0.2638 0.2790 0.2590 0.2505 0.2431
Avg Mean 0.6365 0.6639 0.6464 0.6498 0.6509

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1269 1.2026 1.2313 1.2273 1.2313
1French 0.7761 0.8312 0.8086 0.8157 0.8526
2English 0.3177 0.3220 0.4026 0.4640 0.4640
3German 1.1149 1.1149 1.1285 1.1585 1.1585
4French 0.7659 0.7684 0.7746 0.7720 0.7769
5French 0.7775 0.9295 0.8426 0.8637 0.9295
6English 1.0807 1.1258 1.1101 1.1258 1.1258
7German 0.5122 0.5182 0.5631 0.5778 0.6298
8Spanish 0.6455 0.6840 0.7005 0.7105 0.7414
9French 0.8172 0.7758 0.7964 0.8172 0.8355

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0182 1.0507 1.0753 1.0955 1.1007
Std Dev 0.2379 0.2240 0.1868 0.1745 0.1764
Avg Mean 0.8342 0.8666 0.8390 0.8465 0.8491

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2314 1.2080 1.2740 1.2921 1.2921
1French 1.2232 1.2332 1.2332 1.2377 1.2377
2English 0.5997 0.6276 0.7881 0.8523 0.8523
3German 1.1695 1.1722 1.1507 1.1986 1.1986
4French 1.2457 1.2799 1.3074 1.2838 1.3074
5French 0.8866 1.0465 1.0195 1.0195 1.0465
6English 0.9340 0.9862 0.9862 1.0048 1.0048
7German 0.7097 0.7272 0.8389 0.8410 0.8410
8Spanish 0.9461 0.9893 0.9350 0.9893 0.9893
9French 1.2362 1.2370 1.2198 1.2362 1.2370

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0417 1.0757 1.1070 1.1228 1.1283
Std Dev 0.2386 0.2339 0.1810 0.1710 0.1751
Avg Mean 0.8528 0.8875 0.8582 0.8678 0.8704

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1051 1.0830 1.1501 1.1702 1.1702
1French 1.2936 1.3127 1.3127 1.3222 1.3222
2English 0.5765 0.6058 0.7828 0.8170 0.8170
3German 1.2334 1.2331 1.2296 1.2522 1.2522
4French 1.2353 1.2564 1.2991 1.2639 1.2991
5French 0.9750 1.1631 1.1466 1.1466 1.1631
6English 0.9358 0.9734 0.9799 1.0093 1.0093
7German 0.7218 0.7421 0.8478 0.8630 0.8630
8Spanish 1.1311 1.1742 1.1216 1.1742 1.1742
9French 1.2093 1.2128 1.1998 1.2093 1.2128

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8839 0.9104 0.9344 0.9464 0.9553
Std Dev 0.2272 0.2160 0.1796 0.1750 0.1772
Avg Mean 0.7294 0.7574 0.7333 0.7396 0.7416

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0549 1.0490 1.0790 1.1192 1.1192
1French 1.0704 1.0783 1.0783 1.0783 1.0923
2English 0.5017 0.5310 0.6894 0.7025 0.7025
3German 1.1204 1.1388 1.1290 1.1388 1.1388
4French 1.1153 1.1394 1.1697 1.1394 1.1697
5French 0.7279 0.8725 0.8282 0.8282 0.8725
6English 0.8527 0.8724 0.9049 0.9047 0.9049
7German 0.5705 0.5782 0.6768 0.6910 0.6910
8Spanish 0.8131 0.8501 0.8116 0.8501 0.8501
9French 1.0116 0.9944 0.9771 1.0116 1.0116

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0674 1.0989 1.1503 1.1399 1.1609
Std Dev 0.2425 0.2529 0.2048 0.1955 0.2083
Avg Mean 0.8691 0.9041 0.8739 0.8839 0.8864

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2289 1.2245 1.3037 1.3001 1.3037
1French 1.2979 1.3219 1.3527 1.3219 1.3527
2English 0.5868 0.6071 0.8132 0.8134 0.8134
3German 1.1185 1.1288 1.1510 1.1510 1.1510
4French 1.2537 1.2548 1.3456 1.2817 1.3456
5French 1.0679 1.3005 1.2465 1.2465 1.3005
6English 1.1771 1.1903 1.2155 1.1905 1.2155
7German 0.6678 0.6654 0.7671 0.7616 0.7671
8Spanish 1.1366 1.1717 1.1317 1.1717 1.1717
9French 1.1385 1.1239 1.1762 1.1606 1.1881

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0103 1.0374 1.0845 1.0736 1.0972
Std Dev 0.2397 0.2533 0.2088 0.1943 0.2124
Avg Mean 0.8174 0.8536 0.8249 0.8344 0.8363

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0651 1.0344 1.1200 1.1066 1.1200
1French 1.2557 1.2760 1.2809 1.2583 1.2809
2English 0.5464 0.5541 0.7294 0.7332 0.7332
3German 1.1563 1.1557 1.1629 1.1597 1.1629
4French 1.2169 1.2297 1.3178 1.2297 1.3178
5French 1.0553 1.2633 1.2140 1.2140 1.2633
6English 1.0399 1.0740 1.1273 1.1275 1.1275
7German 0.6163 0.6187 0.7168 0.7167 0.7168
8Spanish 1.0128 1.0402 1.0107 1.0402 1.0487
9French 1.1381 1.1281 1.1657 1.1496 1.2004

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7287 0.7761 0.8128 0.8190 0.8507
Std Dev 0.2415 0.2469 0.2125 0.2120 0.2348
Avg Mean 0.6028 0.6300 0.6110 0.6160 0.6182

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.5059 0.5173 0.5334 0.5646 0.5646
1French 1.0599 1.0794 1.1280 1.1280 1.1919
2English 0.3369 0.4198 0.4931 0.5142 0.5142
3German 0.9284 1.0025 1.0025 1.0094 1.0473
4French 0.8590 0.9032 0.9129 0.9639 0.9639
5French 0.5827 0.6050 0.7279 0.7124 0.7279
6English 0.6672 0.7264 0.8368 0.7746 0.8368
7German 0.5587 0.5796 0.6511 0.6488 0.6511
8Spanish 0.7423 0.7995 0.8124 0.8272 0.8808
9French 1.0463 1.1286 1.0296 1.0466 1.1286

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6651 0.7163 0.7486 0.7537 0.7703
Std Dev 0.2203 0.2239 0.1956 0.2015 0.2119
Avg Mean 0.5480 0.5734 0.5544 0.5592 0.5620

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.4735 0.4737 0.5110 0.5215 0.5215
1French 0.9586 1.0144 1.0517 1.0517 1.1154
2English 0.3252 0.4292 0.4978 0.4965 0.4978
3German 0.9092 0.9629 0.9629 0.9911 0.9939
4French 0.7944 0.8742 0.8151 0.8742 0.8742
5French 0.5869 0.5844 0.7027 0.7094 0.7094
6English 0.5076 0.6049 0.6958 0.6361 0.6958
7German 0.4675 0.4787 0.5227 0.5371 0.5371
8Spanish 0.7505 0.8420 0.8443 0.8420 0.8590
9French 0.8771 0.8990 0.8817 0.8771 0.8990

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6109 0.6477 0.6647 0.6831 0.7008
Std Dev 0.2137 0.2175 0.1938 0.1962 0.1948
Avg Mean 0.5024 0.5159 0.4972 0.5037 0.5041

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.9077 0.8387 0.7916 0.9145 0.9145
1French 0.6426 0.7534 0.7256 0.7340 0.7567
2English 0.3374 0.3942 0.4532 0.5248 0.5248
3German 0.8259 0.8405 0.8592 0.8675 0.8862
4French 0.7733 0.9628 0.9628 0.9288 0.9628
5French 0.3198 0.3729 0.3862 0.3862 0.4062
6English 0.5761 0.6278 0.6672 0.6410 0.6672
7German 0.3825 0.3985 0.4904 0.4671 0.4904
8Spanish 0.5511 0.5038 0.5166 0.5523 0.5845
9French 0.7924 0.7844 0.7942 0.8146 0.8146

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7977 1.8396 1.8937 1.9131 1.9235
Std Dev 0.4141 0.4187 0.3319 0.3083 0.3150
Avg Mean 1.4730 1.5270 1.4786 1.4947 1.4977

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0661 2.0505 2.1619 2.1282 2.1619
1French 2.1434 2.1459 2.1522 2.1463 2.1522
2English 0.9397 0.9205 1.1950 1.2670 1.2670
3German 2.1502 2.1543 2.1609 2.1822 2.1822
4French 2.0839 2.1082 2.1479 2.1064 2.1479
5French 1.6871 1.9733 1.9552 1.9552 1.9787
6English 1.6684 1.7464 1.7464 1.8327 1.8327
7German 1.2654 1.2715 1.4843 1.4843 1.4843
8Spanish 1.8729 1.9281 1.8559 1.9281 1.9281
9French 2.1002 2.0969 2.0773 2.1002 2.1002

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8655 1.9179 1.9764 1.9911 2.0030
Std Dev 0.4594 0.4635 0.3655 0.3525 0.3586
Avg Mean 1.5224 1.5874 1.5349 1.5534 1.5569

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8768 1.8537 1.9771 1.9443 1.9771
1French 2.3525 2.3673 2.3673 2.3777 2.3777
2English 0.9413 0.9562 1.2536 1.3008 1.3008
3German 2.2198 2.2221 2.2121 2.2457 2.2457
4French 2.2105 2.2448 2.2996 2.2351 2.2996
5French 1.8523 2.1548 2.1511 2.1511 2.1548
6English 1.6756 1.7574 1.7574 1.8312 1.8312
7German 1.2547 1.2891 1.4908 1.4908 1.4908
8Spanish 2.0719 2.1348 2.0683 2.1348 2.1526
9French 2.2000 2.1986 2.1863 2.2000 2.2000

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7750 1.8247 1.8765 1.8977 1.9133
Std Dev 0.4303 0.4140 0.3341 0.3139 0.3261
Avg Mean 1.4484 1.5107 1.4600 1.4749 1.4790

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0527 2.0382 2.1174 2.1641 2.1641
1French 2.1680 2.1726 2.1726 2.1850 2.1900
2English 0.9756 1.0290 1.3382 1.3799 1.3799
3German 2.0677 2.0745 2.0411 2.0854 2.0874
4French 2.2031 2.2408 2.3294 2.2408 2.3294
5French 1.6320 1.9314 1.8749 1.8749 1.9314
6English 1.7127 1.7319 1.7685 1.7832 1.7832
7German 1.1227 1.1594 1.3337 1.3774 1.3774
8Spanish 1.7627 1.8335 1.7722 1.8335 1.8372
9French 2.0527 2.0358 2.0174 2.0527 2.0527

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8149 1.8563 1.9296 1.9202 1.9482
Std Dev 0.4287 0.4519 0.3640 0.3593 0.3693
Avg Mean 1.4762 1.5332 1.4822 1.5004 1.5031

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0991 2.0966 2.2230 2.1829 2.2230
1French 2.2124 2.2356 2.2663 2.2356 2.2663
2English 0.8995 0.8832 1.1952 1.1778 1.1952
3German 2.0349 2.0397 2.0653 2.0653 2.0653
4French 2.0778 2.0615 2.1639 2.0778 2.1639
5French 1.8498 2.2058 2.1574 2.1574 2.2058
6English 1.9146 1.9434 1.9497 2.0029 2.0029
7German 1.1719 1.1811 1.3584 1.3512 1.3584
8Spanish 1.9742 2.0180 1.9580 2.0180 2.0180
9French 1.9146 1.8979 1.9585 1.9334 1.9831

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7943 1.8421 1.8963 1.9123 1.9265
Std Dev 0.4474 0.4435 0.3601 0.3461 0.3552
Avg Mean 1.4693 1.5295 1.4792 1.4960 1.4989

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0454 2.0362 2.1145 2.1440 2.1495
1French 2.2890 2.2936 2.2936 2.2952 2.2952
2English 0.9306 0.9466 1.2411 1.2727 1.2727
3German 2.1082 2.1209 2.0988 2.1393 2.1393
4French 2.1436 2.1673 2.2611 2.1752 2.2611
5French 1.7027 1.9983 1.9584 1.9584 1.9983
6English 1.6638 1.7470 1.7470 1.8141 1.8141
7German 1.1577 1.1723 1.3648 1.3687 1.3687
8Spanish 1.7973 1.8505 1.7913 1.8505 1.8609
9French 2.1048 2.0886 2.0928 2.1048 2.1048

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6974 1.7359 1.7887 1.8043 1.8119
Std Dev 0.4156 0.4236 0.3399 0.3324 0.3368
Avg Mean 1.3936 1.4459 1.3999 1.4153 1.4178

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9632 1.9719 2.0616 2.0705 2.0788
1French 2.0440 2.0538 2.0585 2.0550 2.0622
2English 0.8698 0.8430 1.1213 1.1390 1.1390
3German 2.1664 2.1821 2.1692 2.1821 2.1821
4French 1.9631 1.9845 2.0474 2.0074 2.0474
5French 1.5726 1.8374 1.8202 1.8202 1.8412
6English 1.5978 1.6422 1.6411 1.7239 1.7239
7German 1.1361 1.1446 1.3392 1.3392 1.3392
8Spanish 1.7527 1.7974 1.7458 1.7974 1.7974
9French 1.9081 1.9020 1.8825 1.9081 1.9081

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8026 1.8541 1.9326 1.9311 1.9696
Std Dev 0.5010 0.4799 0.3846 0.3942 0.4028
Avg Mean 1.4779 1.5389 1.4892 1.5032 1.5074

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.7270 1.7081 1.7852 1.8093 1.8093
1French 2.3623 2.3793 2.4191 2.3883 2.4598
2English 0.8584 0.9144 1.1863 1.1794 1.1863
3German 2.3182 2.3214 2.3273 2.3333 2.3333
4French 2.2212 2.2375 2.2341 2.2474 2.2894
5French 1.6948 1.8322 1.9607 1.9607 2.0144
6English 1.5675 1.7655 1.9466 1.7968 1.9466
7German 1.2139 1.2447 1.4508 1.4508 1.4508
8Spanish 1.8193 1.9025 1.8971 1.9025 1.9630
9French 2.2429 2.2356 2.1185 2.2429 2.2429

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7817 1.8385 1.9087 1.9118 1.9354
Std Dev 0.5341 0.5212 0.4440 0.4598 0.4623
Avg Mean 1.4495 1.5095 1.4612 1.4749 1.4803

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.4267 1.4622 1.5581 1.5558 1.5581
1French 2.4885 2.4814 2.5158 2.4835 2.5732
2English 0.8388 0.8956 1.1289 1.0909 1.1289
3German 2.4015 2.4581 2.4679 2.4914 2.4980
4French 2.0734 2.0838 2.0420 2.0838 2.0841
5French 1.8814 1.9653 2.1144 2.1027 2.1144
6English 1.4083 1.6056 1.7207 1.6402 1.7207
7German 1.2321 1.2453 1.4454 1.4454 1.4454
8Spanish 1.9827 2.1401 2.1290 2.1401 2.1476
9French 2.0838 2.0481 1.9650 2.0838 2.0838

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6513 1.7063 1.7464 1.7580 1.7837
Std Dev 0.4352 0.4605 0.3781 0.3729 0.3824
Avg Mean 1.3513 1.4076 1.3615 1.3761 1.3789

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9622 1.9468 2.0330 2.0690 2.0690
1French 1.9589 2.0657 2.0189 2.0335 2.0781
2English 0.8182 0.7728 1.0274 1.0490 1.0490
3German 2.0161 2.0477 2.0879 2.0561 2.0879
4French 2.0757 2.2022 2.1892 2.1122 2.2022
5French 1.4342 1.7543 1.7053 1.7053 1.7543
6English 1.5985 1.6178 1.6909 1.7206 1.7206
7German 1.0469 1.0500 1.2384 1.1971 1.2384
8Spanish 1.6495 1.6846 1.5991 1.6846 1.6852
9French 1.9524 1.9206 1.8735 1.9524 1.9524

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7013 1.7937 1.8276 1.8426 1.8982
Std Dev 0.4111 0.4399 0.3261 0.3341 0.3534
Avg Mean 1.3805 1.4300 1.3837 1.3990 1.4019

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0113 2.0734 2.0734 2.1087 2.1087
1French 1.8646 2.0749 2.0749 2.1048 2.1626
2English 0.8097 0.8343 1.1538 1.1034 1.1538
3German 1.9289 1.9227 1.9575 1.9506 1.9797
4French 2.0792 2.0168 2.1658 2.0792 2.1658
5French 1.6738 2.1693 1.8946 1.9074 2.1693
6English 1.7307 1.7948 1.7573 1.8398 1.8398
7German 1.1270 1.1637 1.3555 1.3880 1.3880
8Spanish 1.9344 2.0282 1.9304 2.0282 2.0979
9French 1.8532 1.8592 1.9131 1.9163 1.9163

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6088 1.6750 1.7174 1.7289 1.7616
Std Dev 0.3849 0.3989 0.3002 0.3166 0.3198
Avg Mean 1.3245 1.3711 1.3261 1.3382 1.3428

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8097 1.7644 1.8451 1.9185 1.9185
1French 1.9501 2.0587 2.0406 2.0406 2.0958
2English 0.8356 0.8575 1.1132 1.1050 1.1132
3German 1.9650 2.0260 2.0101 2.0182 2.0464
4French 1.8949 2.0296 2.0049 1.9771 2.0296
5French 1.4371 1.7113 1.6936 1.7124 1.7480
6English 1.4514 1.5264 1.6523 1.5550 1.6523
7German 1.1311 1.1354 1.3329 1.3050 1.3329
8Spanish 1.7368 1.7817 1.7025 1.7817 1.8034
9French 1.8759 1.8590 1.7791 1.8759 1.8759

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7561 1.7931 1.8711 1.8821 1.9198
Std Dev 0.5128 0.4824 0.3923 0.4187 0.4156
Avg Mean 1.4272 1.4791 1.4304 1.4457 1.4476

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0760 1.9657 2.0306 2.0822 2.0975
1French 2.0265 2.0437 2.0827 2.0473 2.1600
2English 0.8115 0.8188 1.1059 1.0800 1.1059
3German 2.2480 2.2188 2.2751 2.3131 2.3131
4French 2.2909 2.3624 2.3603 2.3928 2.3928
5French 1.6177 1.8221 1.8468 1.8619 1.9305
6English 1.5157 1.7344 1.8718 1.7344 1.8718
7German 1.0554 1.1254 1.3563 1.3563 1.3563
8Spanish 1.7153 1.7404 1.7171 1.7489 1.7656
9French 2.2044 2.0996 2.0646 2.2044 2.2044

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.1625 1.1956 1.2300 1.2426 1.2543
Std Dev 0.3060 0.3030 0.2460 0.2545 0.2517
Avg Mean 0.9454 0.9796 0.9509 0.9609 0.9635

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.4779 1.4527 1.5485 1.5434 1.5485
1French 1.4304 1.4440 1.4440 1.4551 1.4551
2English 0.5605 0.5952 0.7945 0.7625 0.7945
3German 1.3969 1.3842 1.3788 1.3906 1.4043
4French 1.2123 1.2185 1.2123 1.2185 1.2325
5French 1.1994 1.4025 1.3866 1.3961 1.4199
6English 1.1788 1.2434 1.2325 1.2911 1.2911
7German 0.6810 0.6975 0.8356 0.8356 0.8356
8Spanish 1.1934 1.2387 1.1831 1.2387 1.2387
9French 1.2943 1.2796 1.2842 1.2943 1.3225

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7789 0.8160 0.8245 0.8353 0.8623
Std Dev 0.2527 0.2791 0.2632 0.2584 0.2538
Avg Mean 0.6175 0.6439 0.6281 0.6316 0.6324

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1392 1.2228 1.2553 1.2746 1.2746
1French 0.7895 0.8362 0.7981 0.8180 0.8362
2English 0.3173 0.3265 0.3999 0.4420 0.4420
3German 0.9779 0.9935 1.0080 0.9989 1.0138
4French 0.6726 0.6710 0.6748 0.6828 0.6828
5French 0.7903 0.9830 0.8756 0.8983 0.9830
6English 1.1144 1.1540 1.1778 1.1778 1.1778
7German 0.5287 0.5264 0.5855 0.5855 0.6361
8Spanish 0.7127 0.7282 0.7284 0.7282 0.8130
9French 0.7466 0.7187 0.7416 0.7466 0.7637

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.9633 2.0168 2.0701 2.0903 2.0995
Std Dev 0.5137 0.5115 0.4097 0.4161 0.4159
Avg Mean 1.6025 1.6612 1.6097 1.6279 1.6312

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.4235 2.4099 2.5149 2.5075 2.5149
1French 2.4211 2.4300 2.4300 2.4361 2.4361
2English 0.9111 0.9389 1.2605 1.2413 1.2605
3German 2.3757 2.3748 2.3629 2.3808 2.3907
4French 2.1337 2.1473 2.1297 2.1389 2.1473
5French 1.9674 2.2905 2.2750 2.2811 2.3235
6English 1.9078 2.0176 2.0176 2.1223 2.1223
7German 1.2078 1.2391 1.4548 1.4548 1.4548
8Spanish 2.0756 2.1304 2.0623 2.1304 2.1331
9French 2.2095 2.1894 2.1934 2.2095 2.2118

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6995 1.7653 1.7926 1.8202 1.8531
Std Dev 0.4661 0.5016 0.4490 0.4356 0.4366
Avg Mean 1.3750 1.4230 1.3857 1.3972 1.4009

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.2043 2.2733 2.3453 2.3853 2.3853
1French 1.9651 2.0335 2.0216 2.0457 2.0783
2English 0.6984 0.6728 0.8523 0.9319 0.9319
3German 2.2074 2.2074 2.2496 2.2390 2.2496
4French 1.8575 1.8757 1.8739 1.8575 1.9090
5French 1.6828 2.0877 1.8975 1.9718 2.0877
6English 1.9177 2.0165 2.0698 2.0698 2.0698
7German 1.1686 1.1817 1.3182 1.3182 1.3646
8Spanish 1.5776 1.6667 1.6131 1.6667 1.7392
9French 1.7159 1.6377 1.6843 1.7159 1.7159

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100
DramaBox Prompt
A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100
DramaBox Prompt
High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.