Best-of-N Analysis: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#KeyScore FormulaText Prompt(s)
0standardContent Enjoyment
1clap_lqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2clap_sqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3clap_lpcos(audio, prompt)Original DramaBox prompt
4clap_spcos(audio, prompt)Original DramaBox prompt
5v1_nat_Lcos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
6v2_auth_Lcos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
7v3_pro_Lcos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
8v4_expr_Lcos(audio, expr)"expressive, dynamic voice acting with rich emotional range"
9v5_cine_Lcos(audio, cine)"immersive cinematic narration, compelling storytelling"
10v6_nat_Scos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
11v7_auth_Scos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
12v8_pro_Scos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
13v9_nr_Lcos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14v10_ac_Lcos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15v11_pd_Lcos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16v12_ef_Lcos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17v13_ff_Lcos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18v14_wr_Lcos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19v15_nr_Scos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20v16_ac_Scos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21v17_pd_Scos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22v18_ef_Scos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23v19_ff_Scos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24v20_wr_Scos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25v21_san_Lcos(audio, sanitized_prompt)Quoted speech removed (Large)
26v22_san_Scos(audio, sanitized_prompt)Quoted speech removed (Small)
27v23_snr_Lcos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Large)
28v24_snr_Scos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method N=5 N=10 N=25 N=50 N=100 Gain N=5→100Knee Point
Standard: (1−WER) × Content Enjoyment 3.9546 4.0007 4.0772 4.1326 4.1640 +0.2093 N=50
VoiceCLAP-Large × Quality Text 0.9101 0.9363 0.9531 0.9551 0.9596 +0.0495 N=50
VoiceCLAP-Small × Quality Text 0.7434 0.7655 0.7740 0.7969 0.7989 +0.0555 N=25
VoiceCLAP-Large × Prompt Match 1.3972 1.4204 1.4363 1.4510 1.4585 +0.0612 N=25
VoiceCLAP-Small × Prompt Match 0.8601 0.8677 0.8774 0.8891 0.9028 +0.0426 N=25
v1 Natural (Large) 0.9750 1.0031 1.0236 1.0246 1.0333 +0.0583 N=50
v2 Authentic (Large) 1.0092 1.0361 1.0536 1.0591 1.0631 +0.0539 N=50
v3 Professional (Large) 0.8574 0.8850 0.9008 0.9019 0.9088 +0.0513 N=50
v4 Expressive (Large) 1.0185 1.0361 1.0584 1.0717 1.0756 +0.0571 N=50
v5 Cinematic (Large) 0.9942 1.0057 1.0249 1.0338 1.0391 +0.0449 N=50
v6 Natural (Small) 0.7754 0.8111 0.8265 0.8481 0.8562 +0.0807 N=100
v7 Authentic (Small) 0.7471 0.7727 0.7910 0.8014 0.8104 +0.0634 N=50
v8 Professional (Small) 0.6180 0.6390 0.6422 0.6624 0.6708 +0.0528 N=25
v9 Natural−Robotic (Large) 1.7588 1.7906 1.8169 1.8278 1.8339 +0.0751 N=25
v10 Authentic−Cheap (Large) 1.8458 1.8761 1.9049 1.9166 1.9189 +0.0731 N=50
v11 Professional−Distorted (Large) 1.7378 1.7827 1.8108 1.8153 1.8213 +0.0835 N=50
v12 Expressive−Flat (Large) 1.7552 1.7865 1.8133 1.8333 1.8406 +0.0854 N=50
v13 FullPos−FullNeg (Large) 1.7552 1.7934 1.8229 1.8280 1.8360 +0.0808 N=50
v14 Warm−Robotic (Large) 1.6719 1.7076 1.7300 1.7401 1.7497 +0.0778 N=25
v15 Natural−Robotic (Small) 1.8316 1.8742 1.9115 1.9234 1.9383 +0.1067 N=50
v16 Authentic−Cheap (Small) 1.8657 1.8969 1.9666 1.9741 1.9951 +0.1294 N=50
v17 Professional−Distorted (Small) 1.7016 1.7315 1.7590 1.7725 1.7755 +0.0739 N=50
v18 Expressive−Flat (Small) 1.5761 1.6173 1.6483 1.6891 1.7099 +0.1337 N=100
v19 FullPos−FullNeg (Small) 1.6357 1.6796 1.7039 1.7201 1.7248 +0.0892 N=25
v20 Warm−Robotic (Small) 1.7297 1.7715 1.8069 1.8281 1.8450 +0.1153 N=50
v21 Sanitized Prompt (Large) 1.1532 1.1751 1.1969 1.2031 1.2158 +0.0625 N=50
v22 Sanitized Prompt (Small) 0.8346 0.8339 0.8457 0.8624 0.8819 +0.0473 N=10
v23 Sanitized−Uncanny (Large) 1.9507 1.9826 2.0204 2.0303 2.0413 +0.0906 N=50
v24 Sanitized−Uncanny (Small) 1.8188 1.8346 1.8690 1.9133 1.9251 +0.1063 N=50

Diminishing Returns — All Methods Overlaid

Original (5) 0.669 1.410 2.150 2.891 3.632 4.372 N=5 N=10 N=25 N=50 N=100 N candidates Standard: (1−WER) × Content Enjoyment VoiceCLAP-Large × Quality Text VoiceCLAP-Small × Quality Text VoiceCLAP-Large × Prompt Match VoiceCLAP-Small × Prompt Match Positive-only, Large (5) 0.772 0.843 0.915 0.986 1.058 1.129 N=5 N=10 N=25 N=50 N=100 N candidates v1 Natural (Large) v2 Authentic (Large) v3 Professional (Large) v4 Expressive (Large) v5 Cinematic (Large) Positive-only, Small (3) 0.556 0.625 0.693 0.762 0.830 0.899 N=5 N=10 N=25 N=50 N=100 N candidates v6 Natural (Small) v7 Authentic (Small) v8 Professional (Small) Pos−Neg, Large (6) 1.505 1.607 1.709 1.811 1.913 2.015 N=5 N=10 N=25 N=50 N=100 N candidates v9 Natural−Robotic (Large) v10 Authentic−Cheap (Large) v11 Professional−Distorted (Large) v12 Expressive−Flat (Large) v13 FullPos−FullNeg (Large) v14 Warm−Robotic (Large) Pos−Neg, Small (6) 1.419 1.554 1.689 1.824 1.960 2.095 N=5 N=10 N=25 N=50 N=100 N candidates v15 Natural−Robotic (Small) v16 Authentic−Cheap (Small) v17 Professional−Distorted (Small) v18 Expressive−Flat (Small) v19 FullPos−FullNeg (Small) v20 Warm−Robotic (Small) Sanitized Prompt (4) 0.751 1.029 1.308 1.586 1.865 2.143 N=5 N=10 N=25 N=50 N=100 N candidates v21 Sanitized Prompt (Large) v22 Sanitized Prompt (Small) v23 Sanitized−Uncanny (Large) v24 Sanitized−Uncanny (Small)

Marginal Improvement per Additional Candidate

MethodN=5→10N=10→25N=25→50N=50→100
Standard: (1−WER) × Content Enjoyment 0.00920/cand (1.2%) 0.00510/cand (1.9%) 0.00222/cand (1.4%) 0.00063/cand (0.8%)
VoiceCLAP-Large × Quality Text 0.00524/cand (2.9%) 0.00112/cand (1.8%) 0.00008/cand (0.2%) 0.00009/cand (0.5%)
VoiceCLAP-Small × Quality Text 0.00441/cand (3.0%) 0.00057/cand (1.1%) 0.00092/cand (3.0%) 0.00004/cand (0.2%)
VoiceCLAP-Large × Prompt Match 0.00463/cand (1.7%) 0.00106/cand (1.1%) 0.00059/cand (1.0%) 0.00015/cand (0.5%)
VoiceCLAP-Small × Prompt Match 0.00152/cand (0.9%) 0.00065/cand (1.1%) 0.00047/cand (1.3%) 0.00027/cand (1.5%)
v1 Natural (Large) 0.00561/cand (2.9%) 0.00137/cand (2.0%) 0.00004/cand (0.1%) 0.00018/cand (0.9%)
v2 Authentic (Large) 0.00538/cand (2.7%) 0.00117/cand (1.7%) 0.00022/cand (0.5%) 0.00008/cand (0.4%)
v3 Professional (Large) 0.00552/cand (3.2%) 0.00105/cand (1.8%) 0.00005/cand (0.1%) 0.00014/cand (0.8%)
v4 Expressive (Large) 0.00352/cand (1.7%) 0.00149/cand (2.2%) 0.00053/cand (1.3%) 0.00008/cand (0.4%)
v5 Cinematic (Large) 0.00229/cand (1.2%) 0.00128/cand (1.9%) 0.00036/cand (0.9%) 0.00011/cand (0.5%)
v6 Natural (Small) 0.00714/cand (4.6%) 0.00103/cand (1.9%) 0.00086/cand (2.6%) 0.00016/cand (0.9%)
v7 Authentic (Small) 0.00512/cand (3.4%) 0.00122/cand (2.4%) 0.00042/cand (1.3%) 0.00018/cand (1.1%)
v8 Professional (Small) 0.00420/cand (3.4%) 0.00022/cand (0.5%) 0.00081/cand (3.1%) 0.00017/cand (1.3%)
v9 Natural−Robotic (Large) 0.00636/cand (1.8%) 0.00175/cand (1.5%) 0.00044/cand (0.6%) 0.00012/cand (0.3%)
v10 Authentic−Cheap (Large) 0.00606/cand (1.6%) 0.00192/cand (1.5%) 0.00047/cand (0.6%) 0.00005/cand (0.1%)
v11 Professional−Distorted (Large) 0.00898/cand (2.6%) 0.00187/cand (1.6%) 0.00018/cand (0.2%) 0.00012/cand (0.3%)
v12 Expressive−Flat (Large) 0.00626/cand (1.8%) 0.00179/cand (1.5%) 0.00080/cand (1.1%) 0.00015/cand (0.4%)
v13 FullPos−FullNeg (Large) 0.00764/cand (2.2%) 0.00196/cand (1.6%) 0.00020/cand (0.3%) 0.00016/cand (0.4%)
v14 Warm−Robotic (Large) 0.00712/cand (2.1%) 0.00150/cand (1.3%) 0.00040/cand (0.6%) 0.00019/cand (0.6%)
v15 Natural−Robotic (Small) 0.00851/cand (2.3%) 0.00249/cand (2.0%) 0.00048/cand (0.6%) 0.00030/cand (0.8%)
v16 Authentic−Cheap (Small) 0.00622/cand (1.7%) 0.00465/cand (3.7%) 0.00030/cand (0.4%) 0.00042/cand (1.1%)
v17 Professional−Distorted (Small) 0.00599/cand (1.8%) 0.00183/cand (1.6%) 0.00054/cand (0.8%) 0.00006/cand (0.2%)
v18 Expressive−Flat (Small) 0.00824/cand (2.6%) 0.00207/cand (1.9%) 0.00163/cand (2.5%) 0.00042/cand (1.2%)
v19 FullPos−FullNeg (Small) 0.00879/cand (2.7%) 0.00162/cand (1.4%) 0.00065/cand (1.0%) 0.00009/cand (0.3%)
v20 Warm−Robotic (Small) 0.00836/cand (2.4%) 0.00236/cand (2.0%) 0.00085/cand (1.2%) 0.00034/cand (0.9%)
v21 Sanitized Prompt (Large) 0.00438/cand (1.9%) 0.00145/cand (1.9%) 0.00025/cand (0.5%) 0.00025/cand (1.1%)
v22 Sanitized Prompt (Small) -0.00014/cand (-0.1%) 0.00079/cand (1.4%) 0.00067/cand (2.0%) 0.00039/cand (2.3%)
v23 Sanitized−Uncanny (Large) 0.00639/cand (1.6%) 0.00252/cand (1.9%) 0.00040/cand (0.5%) 0.00022/cand (0.5%)
v24 Sanitized−Uncanny (Small) 0.00316/cand (0.9%) 0.00230/cand (1.9%) 0.00177/cand (2.4%) 0.00023/cand (0.6%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking MethodWithout Suffix (N=10)With Suffix (N=10)Delta
MeanBestMedianMeanBestMedianΔ MeanΔ Best
Standard: (1−WER) × Content Enjoyment3.71094.00973.73243.67324.06043.7128-0.0377+0.0507
VoiceCLAP-Large × Quality Text0.86160.92710.86990.75750.82280.7704-0.1040-0.1043
VoiceCLAP-Small × Quality Text0.65810.74820.66580.75750.82280.7704+0.0995+0.0746
VoiceCLAP-Large × Prompt Match1.31261.40881.32720.75750.82280.7704-0.5551-0.5860
VoiceCLAP-Small × Prompt Match0.77220.87800.77220.75750.82280.7704-0.0147-0.0552
v1 Natural (Large)0.92280.99540.92960.75750.82280.7704-0.1653-0.1726
v2 Authentic (Large)0.95551.02870.96550.75750.82280.7704-0.1979-0.2059
v3 Professional (Large)0.81120.87430.81880.75750.82280.7704-0.0537-0.0515
v4 Expressive (Large)0.95021.03320.95830.75750.82280.7704-0.1927-0.2104
v5 Cinematic (Large)0.92400.99830.93110.75750.82280.7704-0.1664-0.1755
v6 Natural (Small)0.71640.79150.72490.75750.82280.7704+0.0411+0.0313
v7 Authentic (Small)0.69060.75360.70120.75750.82280.7704+0.0669+0.0692
v8 Professional (Small)0.55110.62050.56140.75750.82280.7704+0.2064+0.2023
v9 Natural−Robotic (Large)1.66321.78261.67821.51511.64561.5408-0.1481-0.1370
v10 Authentic−Cheap (Large)1.74241.86241.76141.51511.64561.5408-0.2273-0.2168
v11 Professional−Distorted (Large)1.64751.76581.66231.51511.64561.5408-0.1324-0.1203
v12 Expressive−Flat (Large)1.64541.77031.65861.51511.64561.5408-0.1304-0.1248
v13 FullPos−FullNeg (Large)1.66521.78101.68051.51511.64561.5408-0.1502-0.1355
v14 Warm−Robotic (Large)1.58471.69411.59591.51511.64561.5408-0.0696-0.0485
v15 Natural−Robotic (Small)1.72211.84491.74061.51511.64561.5408-0.2070-0.1993
v16 Authentic−Cheap (Small)1.74221.89081.76391.51511.64561.5408-0.2271-0.2452
v17 Professional−Distorted (Small)1.59651.73061.60311.51511.64561.5408-0.0814-0.0850
v18 Expressive−Flat (Small)1.47221.62891.47141.51511.64561.5408+0.0428+0.0167
v19 FullPos−FullNeg (Small)1.54191.66121.55771.51511.64561.5408-0.0269-0.0156
v20 Warm−Robotic (Small)1.61541.76581.62331.51511.64561.5408-0.1004-0.1202
v21 Sanitized Prompt (Large)1.08351.16741.09500.75750.82280.7704-0.3260-0.3446
v22 Sanitized Prompt (Small)0.73770.83800.73990.75750.82280.7704+0.0199-0.0152
v23 Sanitized−Uncanny (Large)1.84371.98421.86191.51511.64561.5408-0.3286-0.3386
v24 Sanitized−Uncanny (Small)1.67931.86231.68471.51511.64561.5408-0.1643-0.2167

Per-Prompt Ablation: Standard Reward (N=10)

#LangNo Suffix MeanNo Suffix BestWith Suffix MeanWith Suffix BestΔ MeanΔ Best
0English4.48164.68444.37074.8535-0.1109+0.1691
1French4.79255.05374.78925.0296-0.0034-0.0241
2English1.08271.29621.14111.3218+0.0585+0.0256
3German4.82004.90884.79444.9058-0.0256-0.0030
4French4.56914.75174.57994.8368+0.0107+0.0851
5French3.72704.29553.82054.6849+0.0936+0.3894
6English3.01293.35773.08923.7102+0.0763+0.3525
7German2.30072.93922.18582.5297-0.1149-0.4095
8Spanish4.76594.91414.73314.8429-0.0327-0.0712
9French3.55683.89583.22813.8884-0.3288-0.0074

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 3.9546 4.0007 4.0772 4.1326 4.1640
Std Dev 1.2168 1.2467 1.2122 1.2253 1.2061
Avg Mean 3.6288 3.6854 3.6552 3.6658 3.6684

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 4.8495 4.8495 4.8853 4.9242 4.9242
1French 4.8763 4.9786 5.0537 5.0537 5.0537
2English 1.3062 1.3142 1.3050 1.3045 1.3245
3German 4.9532 4.9336 4.9589 4.9589 4.9703
4French 4.5461 4.7517 4.7825 4.8121 4.8121
5French 4.2109 4.3700 4.3700 4.6713 4.7707
6English 3.8101 3.6776 3.7326 3.9251 3.9251
7German 2.5013 2.4460 2.7952 2.7806 2.9489
8Spanish 4.9337 4.9628 4.9645 4.9645 4.9645
9French 3.5592 3.7227 3.9241 3.9309 3.9459

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9101 0.9363 0.9531 0.9551 0.9596
Std Dev 0.2855 0.2926 0.2796 0.2767 0.2772
Avg Mean 0.8395 0.8520 0.8452 0.8502 0.8505

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1649 1.2016 1.2032 1.2000 1.2083
1French 1.1614 1.1634 1.1764 1.1764 1.1764
2English 0.3377 0.3422 0.3625 0.3625 0.3625
3German 1.1193 1.1491 1.1478 1.1506 1.1506
4French 1.2467 1.2702 1.2753 1.2651 1.2753
5French 0.8431 0.9029 0.9029 0.9199 0.9329
6English 0.8819 0.9138 0.9167 0.9167 0.9167
7German 0.5716 0.5705 0.6449 0.6559 0.6679
8Spanish 0.8917 0.8849 0.9194 0.9194 0.9194
9French 0.8825 0.9643 0.9823 0.9850 0.9861

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7434 0.7655 0.7740 0.7969 0.7989
Std Dev 0.2771 0.2999 0.2801 0.2858 0.2856
Avg Mean 0.6489 0.6534 0.6377 0.6473 0.6468

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.8133 0.7898 0.8325 0.8476 0.8532
1French 0.9710 1.0147 1.0117 1.0346 1.0346
2English 0.2543 0.2492 0.2606 0.2626 0.2656
3German 1.0623 1.0517 1.0732 1.0850 1.0850
4French 1.1038 1.1580 1.0973 1.1547 1.1580
5French 0.5162 0.5541 0.6197 0.6439 0.6439
6English 0.7100 0.7589 0.7611 0.7611 0.7611
7German 0.4707 0.4178 0.5047 0.5004 0.5047
8Spanish 0.6328 0.6432 0.5757 0.6616 0.6616
9French 0.8997 1.0174 1.0032 1.0174 1.0211

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.3972 1.4204 1.4363 1.4510 1.4585
Std Dev 0.4534 0.4511 0.4395 0.4345 0.4370
Avg Mean 1.2788 1.3014 1.2854 1.2922 1.2937

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.7369 1.7760 1.7637 1.7724 1.7760
1French 1.8243 1.8369 1.8330 1.8369 1.8409
2English 0.4488 0.4581 0.4642 0.4642 0.4720
3German 1.8076 1.8342 1.8335 1.8311 1.8435
4French 1.5887 1.5845 1.5863 1.5968 1.5968
5French 1.5383 1.6107 1.6307 1.6593 1.6960
6English 1.3237 1.2919 1.3063 1.3304 1.3304
7German 0.8218 0.8661 0.9455 1.0004 1.0004
8Spanish 1.6505 1.6173 1.6498 1.6498 1.6599
9French 1.2318 1.3282 1.3502 1.3687 1.3687

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8601 0.8677 0.8774 0.8891 0.9028
Std Dev 0.2878 0.2836 0.2824 0.2867 0.2802
Avg Mean 0.7585 0.7676 0.7591 0.7618 0.7624

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.3089 1.3236 1.3231 1.3180 1.3236
1French 0.8770 0.8758 0.8933 0.8933 0.8978
2English 0.3393 0.3120 0.3140 0.3172 0.3393
3German 1.1609 1.1798 1.1709 1.1873 1.1873
4French 0.8536 0.8944 0.8916 0.9098 0.9098
5French 0.9161 1.0036 1.0036 0.9609 1.0036
6English 1.1154 0.9546 1.0126 1.1154 1.1198
7German 0.5875 0.6317 0.6484 0.6637 0.7078
8Spanish 0.7441 0.7594 0.7529 0.7690 0.7746
9French 0.6984 0.7422 0.7640 0.7566 0.7640

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9750 1.0031 1.0236 1.0246 1.0333
Std Dev 0.2898 0.2938 0.2902 0.2866 0.2863
Avg Mean 0.8977 0.9104 0.9045 0.9083 0.9090

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2812 1.3153 1.3167 1.3153 1.3167
1French 1.2239 1.2264 1.2587 1.2587 1.2587
2English 0.3855 0.3863 0.3988 0.3988 0.3988
3German 1.1058 1.1254 1.1241 1.1259 1.1259
4French 1.3085 1.3121 1.3425 1.3218 1.3425
5French 0.9486 1.0200 1.0200 1.0337 1.0564
6English 0.9223 0.9729 0.9929 0.9929 0.9982
7German 0.6271 0.6375 0.6959 0.7044 0.7279
8Spanish 0.9762 0.9762 0.9916 0.9957 1.0097
9French 0.9714 1.0590 1.0952 1.0985 1.0985

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0092 1.0361 1.0536 1.0591 1.0631
Std Dev 0.2796 0.2874 0.2747 0.2752 0.2742
Avg Mean 0.9293 0.9444 0.9367 0.9417 0.9420

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1473 1.1727 1.1743 1.1738 1.1743
1French 1.2769 1.2900 1.2976 1.2976 1.2976
2English 0.3942 0.3923 0.4098 0.4072 0.4110
3German 1.1747 1.2026 1.1973 1.2026 1.2026
4French 1.2812 1.3065 1.3131 1.3131 1.3131
5French 1.0562 1.1132 1.1132 1.1470 1.1571
6English 0.9454 0.9761 0.9719 0.9842 0.9996
7German 0.6863 0.6881 0.7825 0.7917 0.7929
8Spanish 1.1554 1.1523 1.1853 1.1853 1.1867
9French 0.9747 1.0676 1.0913 1.0881 1.0964

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8574 0.8850 0.9008 0.9019 0.9088
Std Dev 0.2657 0.2709 0.2551 0.2524 0.2551
Avg Mean 0.7889 0.8015 0.7963 0.8004 0.8005

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0724 1.1219 1.1183 1.1142 1.1367
1French 1.1011 1.1017 1.1149 1.1149 1.1149
2English 0.3100 0.3274 0.3536 0.3536 0.3536
3German 1.0849 1.1021 1.0997 1.1039 1.1039
4French 1.1436 1.1703 1.1724 1.1602 1.1724
5French 0.7820 0.8424 0.8424 0.8601 0.8731
6English 0.8488 0.8953 0.8884 0.8884 0.8953
7German 0.5576 0.5474 0.6260 0.6353 0.6388
8Spanish 0.8439 0.8409 0.8767 0.8667 0.8770
9French 0.8301 0.9009 0.9151 0.9220 0.9220

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0185 1.0361 1.0584 1.0717 1.0756
Std Dev 0.2936 0.2848 0.2884 0.2924 0.2915
Avg Mean 0.9240 0.9409 0.9344 0.9365 0.9368

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2724 1.2842 1.2953 1.2953 1.2953
1French 1.2554 1.2501 1.3152 1.3152 1.3152
2English 0.3948 0.4184 0.4087 0.4138 0.4184
3German 1.0474 1.0655 1.0659 1.0767 1.0767
4French 1.2318 1.2347 1.2657 1.2965 1.2965
5French 1.1161 1.1557 1.1742 1.2255 1.2255
6English 1.1639 1.1658 1.1415 1.1725 1.1850
7German 0.6103 0.6372 0.7032 0.7113 0.7113
8Spanish 1.1484 1.1364 1.1781 1.1781 1.1781
9French 0.9444 1.0127 1.0360 1.0324 1.0538

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9942 1.0057 1.0249 1.0338 1.0391
Std Dev 0.2707 0.2753 0.2622 0.2664 0.2675
Avg Mean 0.8979 0.9136 0.9063 0.9094 0.9100

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0969 1.1129 1.1355 1.1355 1.1355
1French 1.2305 1.2361 1.2439 1.2439 1.2460
2English 0.4020 0.3989 0.4057 0.4133 0.4133
3German 1.1007 1.1249 1.1249 1.1132 1.1249
4French 1.2252 1.2376 1.2442 1.2570 1.2570
5French 1.1337 1.1758 1.1852 1.2318 1.2494
6English 1.0978 1.0669 1.0705 1.0978 1.0978
7German 0.6140 0.6238 0.7296 0.7210 0.7296
8Spanish 1.0503 1.0292 1.0467 1.0494 1.0540
9French 0.9911 1.0506 1.0623 1.0748 1.0836

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7754 0.8111 0.8265 0.8481 0.8562
Std Dev 0.2489 0.2572 0.2529 0.2562 0.2528
Avg Mean 0.7011 0.7094 0.6968 0.7031 0.7030

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.5961 0.6320 0.6632 0.6942 0.7206
1French 1.0722 1.0964 1.1172 1.1408 1.1408
2English 0.3032 0.2928 0.3076 0.3139 0.3139
3German 1.0064 0.9901 1.0367 1.0477 1.0477
4French 1.0332 1.0332 1.0546 1.0809 1.0809
5French 0.6138 0.6983 0.6910 0.7417 0.7417
6English 0.7623 0.8702 0.7969 0.8378 0.8702
7German 0.5840 0.5762 0.6664 0.6450 0.6664
8Spanish 0.8426 0.8736 0.8782 0.9211 0.9211
9French 0.9404 1.0485 1.0534 1.0583 1.0583

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7471 0.7727 0.7910 0.8014 0.8104
Std Dev 0.2238 0.2381 0.2310 0.2308 0.2324
Avg Mean 0.6756 0.6847 0.6717 0.6787 0.6788

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.6445 0.6336 0.6879 0.7069 0.7227
1French 1.0493 1.0703 1.0736 1.0749 1.0749
2English 0.3181 0.3160 0.3209 0.3221 0.3221
3German 0.9596 0.9767 1.0000 0.9866 1.0000
4French 0.9540 0.9726 0.9779 1.0113 1.0113
5French 0.6895 0.7726 0.7645 0.8041 0.8041
6English 0.6247 0.7115 0.6667 0.6856 0.7115
7German 0.5465 0.4909 0.5858 0.5857 0.5858
8Spanish 0.8518 0.8706 0.9000 0.9041 0.9041
9French 0.8327 0.9118 0.9322 0.9322 0.9677

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6180 0.6390 0.6422 0.6624 0.6708
Std Dev 0.2338 0.2402 0.2228 0.2209 0.2280
Avg Mean 0.5415 0.5460 0.5348 0.5412 0.5416

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.8310 0.7950 0.8327 0.8434 0.8599
1French 0.7541 0.8065 0.7737 0.8065 0.8065
2English 0.2236 0.2211 0.2369 0.2363 0.2401
3German 0.8567 0.8666 0.8691 0.8600 0.8737
4French 0.9368 0.9622 0.8921 0.9368 0.9622
5French 0.3894 0.4351 0.4726 0.4907 0.4907
6English 0.5885 0.6310 0.6484 0.6484 0.6484
7German 0.3994 0.3874 0.4370 0.4849 0.4849
8Spanish 0.5215 0.5180 0.4820 0.5390 0.5390
9French 0.6790 0.7669 0.7776 0.7776 0.8027

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7588 1.7906 1.8169 1.8278 1.8339
Std Dev 0.5025 0.5067 0.4897 0.4912 0.4898
Avg Mean 1.6184 1.6460 1.6297 1.6368 1.6386

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1794 2.1794 2.1909 2.1909 2.1909
1French 2.1325 2.1447 2.1478 2.1525 2.1525
2English 0.6168 0.6158 0.6198 0.6226 0.6226
3German 2.0821 2.0975 2.0896 2.1060 2.1060
4French 2.2102 2.2160 2.2246 2.2251 2.2251
5French 1.8136 1.9156 1.9156 1.9413 1.9714
6English 1.6954 1.7115 1.7439 1.7492 1.7581
7German 1.2074 1.2167 1.3769 1.3874 1.4091
8Spanish 1.9343 1.9343 1.9553 1.9912 1.9912
9French 1.7165 1.8746 1.9047 1.9120 1.9120

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8458 1.8761 1.9049 1.9166 1.9189
Std Dev 0.5173 0.5306 0.5105 0.5131 0.5123
Avg Mean 1.6952 1.7243 1.7067 1.7155 1.7166

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9831 1.9831 1.9832 1.9915 1.9915
1French 2.3181 2.3265 2.3384 2.3384 2.3384
2English 0.6727 0.6644 0.6739 0.6739 0.6788
3German 2.1530 2.1925 2.1829 2.1925 2.1925
4French 2.3130 2.3408 2.3496 2.3500 2.3500
5French 2.0244 2.1040 2.1151 2.1764 2.1764
6English 1.7337 1.7350 1.7275 1.7556 1.7591
7German 1.2732 1.2647 1.4695 1.4734 1.4734
8Spanish 2.1688 2.1482 2.1943 2.1943 2.2014
9French 1.8184 2.0021 2.0147 2.0205 2.0279

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7378 1.7827 1.8108 1.8153 1.8213
Std Dev 0.5083 0.5139 0.4875 0.4872 0.4886
Avg Mean 1.5996 1.6267 1.6137 1.6218 1.6225

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1264 2.1857 2.1937 2.1941 2.1961
1French 2.2038 2.1910 2.2209 2.2209 2.2209
2English 0.6428 0.6665 0.6948 0.6948 0.6948
3German 2.0039 2.0346 2.0442 2.0442 2.0442
4French 2.2690 2.3018 2.2951 2.2947 2.3018
5French 1.7740 1.8894 1.8894 1.9319 1.9457
6English 1.6992 1.7574 1.7450 1.7450 1.7574
7German 1.1192 1.1122 1.2889 1.2986 1.2986
8Spanish 1.8646 1.8423 1.8783 1.8694 1.8866
9French 1.6754 1.8463 1.8581 1.8597 1.8670

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7552 1.7865 1.8133 1.8333 1.8406
Std Dev 0.5150 0.5011 0.4986 0.5046 0.5052
Avg Mean 1.5997 1.6314 1.6166 1.6206 1.6219

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1971 2.1971 2.2140 2.2140 2.2140
1French 2.1709 2.1557 2.2459 2.2459 2.2459
2English 0.5959 0.6191 0.6191 0.6222 0.6233
3German 1.8857 1.9078 1.9085 1.9139 1.9493
4French 2.0865 2.1030 2.1157 2.1549 2.1549
5French 1.9484 2.0159 2.0159 2.1084 2.1097
6English 1.9152 1.9041 1.8811 1.9226 1.9431
7German 1.1279 1.1905 1.3057 1.3265 1.3281
8Spanish 2.0190 2.0190 2.0544 2.0650 2.0650
9French 1.6051 1.7526 1.7724 1.7591 1.7724

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7552 1.7934 1.8229 1.8280 1.8360
Std Dev 0.5249 0.5279 0.5106 0.5108 0.5092
Avg Mean 1.6186 1.6464 1.6315 1.6389 1.6404

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1390 2.1669 2.1769 2.1760 2.1769
1French 2.2840 2.2839 2.3132 2.3132 2.3132
2English 0.6080 0.6223 0.6360 0.6360 0.6360
3German 2.0264 2.0533 2.0575 2.0575 2.0575
4French 2.2490 2.2571 2.2751 2.2666 2.2751
5French 1.8284 1.9253 1.9253 1.9678 1.9873
6English 1.7041 1.7305 1.7425 1.7425 1.7656
7German 1.1266 1.1320 1.2916 1.2945 1.3186
8Spanish 1.8750 1.8750 1.9048 1.9229 1.9229
9French 1.7117 1.8881 1.9059 1.9030 1.9066

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6719 1.7076 1.7300 1.7401 1.7497
Std Dev 0.5043 0.5091 0.4924 0.4933 0.4923
Avg Mean 1.5430 1.5696 1.5532 1.5615 1.5629

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0858 2.1127 2.1294 2.1209 2.1294
1French 2.0716 2.0794 2.0854 2.0817 2.0959
2English 0.5597 0.5595 0.5642 0.5700 0.5700
3German 2.0887 2.1301 2.1107 2.1301 2.1301
4French 2.1116 2.1116 2.1208 2.1287 2.1287
5French 1.6915 1.7711 1.7711 1.8079 1.8485
6English 1.6241 1.6486 1.6711 1.6711 1.6711
7German 1.1051 1.1148 1.2604 1.2598 1.2932
8Spanish 1.8116 1.8116 1.8490 1.8628 1.8628
9French 1.5697 1.7361 1.7382 1.7676 1.7676

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8316 1.8742 1.9115 1.9234 1.9383
Std Dev 0.5408 0.5515 0.5329 0.5321 0.5352
Avg Mean 1.6750 1.7055 1.6843 1.6942 1.6950

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8830 1.9719 2.0044 1.9967 2.0602
1French 2.4418 2.4318 2.4515 2.4515 2.4515
2English 0.6321 0.6397 0.6540 0.6512 0.6540
3German 2.2744 2.2793 2.2945 2.3125 2.3125
4French 2.3183 2.3586 2.4140 2.3801 2.4140
5French 1.8445 1.9063 1.9413 2.0193 2.0193
6English 1.8042 1.8912 1.8185 1.8422 1.8912
7German 1.2370 1.2076 1.4390 1.4474 1.4474
8Spanish 2.0023 2.0023 2.0029 2.0383 2.0383
9French 1.8788 2.0530 2.0949 2.0949 2.0949

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8657 1.8969 1.9666 1.9741 1.9951
Std Dev 0.5473 0.5720 0.5630 0.5615 0.5636
Avg Mean 1.6937 1.7300 1.7060 1.7167 1.7168

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.7826 1.7826 1.8602 1.8893 1.9237
1French 2.5373 2.4852 2.5373 2.5630 2.5630
2English 0.6786 0.6458 0.6629 0.6604 0.6786
3German 2.3800 2.4158 2.4166 2.4512 2.4512
4French 2.2002 2.2913 2.3926 2.3190 2.3926
5French 2.0492 2.1373 2.1570 2.2059 2.2122
6English 1.6666 1.7311 1.7504 1.7504 1.7641
7German 1.3244 1.2762 1.4962 1.5280 1.5280
8Spanish 2.1587 2.2440 2.3004 2.2813 2.3004
9French 1.8798 1.9593 2.0928 2.0928 2.1373

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7016 1.7315 1.7590 1.7725 1.7755
Std Dev 0.5213 0.5325 0.5174 0.5154 0.5164
Avg Mean 1.5544 1.5782 1.5583 1.5685 1.5697

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0680 2.1123 2.1329 2.1329 2.1329
1French 2.1845 2.1812 2.1980 2.1980 2.1980
2English 0.5533 0.5410 0.5539 0.5615 0.5615
3German 2.0693 2.0525 2.0693 2.0852 2.0891
4French 2.2154 2.2812 2.2721 2.2721 2.2812
5French 1.7162 1.7538 1.8142 1.8170 1.8304
6English 1.6823 1.6819 1.6790 1.7115 1.7115
7German 1.1034 1.1275 1.2649 1.2747 1.2775
8Spanish 1.7463 1.7329 1.7396 1.7926 1.7926
9French 1.6772 1.8512 1.8659 1.8799 1.8799

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5761 1.6173 1.6483 1.6891 1.7099
Std Dev 0.4859 0.4723 0.4746 0.4751 0.4785
Avg Mean 1.4361 1.4513 1.4485 1.4490 1.4513

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0167 2.0921 2.0652 2.0561 2.0921
1French 1.8167 1.7639 1.9345 1.9345 1.9345
2English 0.5129 0.5415 0.5292 0.5292 0.5436
3German 1.7639 1.8144 1.8174 1.8178 1.8178
4French 1.7889 1.7824 1.9028 1.9040 1.9401
5French 1.7407 1.7813 1.7813 1.9195 1.9680
6English 1.6225 1.7011 1.6334 1.7386 1.7386
7German 0.9665 1.0668 1.1447 1.2484 1.2484
8Spanish 2.0735 2.0618 2.0405 2.1084 2.1084
9French 1.4591 1.5680 1.6341 1.6341 1.7073

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6357 1.6796 1.7039 1.7201 1.7248
Std Dev 0.4760 0.4878 0.4697 0.4647 0.4662
Avg Mean 1.5001 1.5248 1.5075 1.5163 1.5178

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8948 1.9396 1.9764 1.9849 1.9849
1French 2.0914 2.1129 2.1206 2.1206 2.1206
2English 0.5701 0.5703 0.5808 0.5903 0.5903
3German 2.0078 1.9958 2.0078 2.0045 2.0092
4French 2.0486 2.1202 2.0950 2.1063 2.1202
5French 1.6271 1.6975 1.7564 1.7724 1.7825
6English 1.5685 1.6643 1.5997 1.6460 1.6643
7German 1.1157 1.1053 1.2710 1.3073 1.3073
8Spanish 1.8430 1.8430 1.8656 1.8656 1.8656
9French 1.5896 1.7474 1.7656 1.8034 1.8034

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7297 1.7715 1.8069 1.8281 1.8450
Std Dev 0.5512 0.5606 0.5566 0.5483 0.5554
Avg Mean 1.5748 1.5983 1.5818 1.5902 1.5910

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0987 2.1787 2.1955 2.2008 2.2211
1French 2.2335 2.1805 2.2823 2.2823 2.2823
2English 0.5374 0.5468 0.5585 0.5585 0.5585
3German 2.1886 2.1958 2.1843 2.2180 2.2180
4French 2.2001 2.2611 2.3347 2.2925 2.3426
5French 1.8094 1.8033 1.8196 1.8954 1.9618
6English 1.6844 1.7625 1.7094 1.7348 1.7625
7German 1.0359 1.0306 1.1912 1.2506 1.2506
8Spanish 1.7668 1.8401 1.7810 1.8356 1.8401
9French 1.7423 1.9158 2.0128 2.0128 2.0128

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.1532 1.1751 1.1969 1.2031 1.2158
Std Dev 0.3592 0.3584 0.3526 0.3479 0.3495
Avg Mean 1.0533 1.0724 1.0610 1.0653 1.0664

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.5740 1.5844 1.5813 1.5813 1.5940
1French 1.4557 1.4532 1.4931 1.4931 1.4931
2English 0.4226 0.4262 0.4321 0.4321 0.4461
3German 1.3554 1.3975 1.4017 1.3870 1.4017
4French 1.2531 1.2596 1.2630 1.2702 1.2704
5French 1.3202 1.3904 1.4050 1.4208 1.4602
6English 1.2157 1.2186 1.2155 1.2302 1.2619
7German 0.6456 0.6824 0.7430 0.7800 0.7846
8Spanish 1.2416 1.2072 1.2465 1.2487 1.2487
9French 1.0485 1.1319 1.1879 1.1879 1.1972

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8346 0.8339 0.8457 0.8624 0.8819
Std Dev 0.2829 0.2756 0.2813 0.2823 0.2844
Avg Mean 0.7299 0.7319 0.7253 0.7279 0.7295

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.3277 1.3434 1.3526 1.3339 1.3526
1French 0.8354 0.8514 0.8513 0.8579 0.8620
2English 0.3389 0.3153 0.3164 0.3184 0.3389
3German 1.0372 1.0500 1.0550 1.0646 1.0868
4French 0.7777 0.7777 0.7938 0.8160 0.8160
5French 0.8999 0.9857 0.9857 0.9709 1.0191
6English 1.1177 0.9355 1.0244 1.1225 1.1515
7German 0.5824 0.6189 0.6292 0.6494 0.6994
8Spanish 0.7867 0.7738 0.7439 0.7867 0.7878
9French 0.6421 0.6872 0.7048 0.7035 0.7048

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.9507 1.9826 2.0204 2.0303 2.0413
Std Dev 0.5932 0.5908 0.5770 0.5746 0.5733
Avg Mean 1.7915 1.8252 1.8047 1.8127 1.8148

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.5428 2.5578 2.5701 2.5701 2.5732
1French 2.4516 2.4411 2.4838 2.4838 2.4838
2English 0.6597 0.6714 0.6801 0.6801 0.6910
3German 2.3243 2.3700 2.3712 2.3700 2.3712
4French 2.2234 2.2287 2.2458 2.2472 2.2488
5French 2.1543 2.2619 2.2863 2.3173 2.3656
6English 1.9721 1.9571 1.9686 2.0113 2.0365
7German 1.1920 1.2294 1.3822 1.4114 1.4271
8Spanish 2.1722 2.1375 2.1884 2.1841 2.1884
9French 1.8144 1.9714 2.0272 2.0272 2.0272

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8188 1.8346 1.8690 1.9133 1.9251
Std Dev 0.5448 0.5561 0.5507 0.5476 0.5391
Avg Mean 1.6350 1.6602 1.6474 1.6528 1.6545

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.5393 2.5393 2.5454 2.5520 2.5520
1French 2.1710 2.1591 2.2309 2.2309 2.2309
2English 0.6485 0.6004 0.5913 0.6071 0.6485
3German 2.2496 2.2748 2.2354 2.3283 2.3283
4French 1.9298 2.0449 2.1306 2.0866 2.1306
5French 1.9965 2.1434 2.1434 2.1468 2.1712
6English 2.0356 1.8748 1.8860 2.0871 2.0871
7German 1.2689 1.3276 1.4254 1.5185 1.5223
8Spanish 1.8110 1.8026 1.8054 1.8557 1.8557
9French 1.5374 1.5788 1.6966 1.7204 1.7243

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100
DramaBox Prompt
A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100
DramaBox Prompt
High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.