Best-of-N Analysis: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#KeyScore FormulaText Prompt(s)
0standardContent Enjoyment
1clap_lqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2clap_sqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3clap_lpcos(audio, prompt)Original DramaBox prompt
4clap_spcos(audio, prompt)Original DramaBox prompt
5v1_nat_Lcos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
6v2_auth_Lcos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
7v3_pro_Lcos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
8v4_expr_Lcos(audio, expr)"expressive, dynamic voice acting with rich emotional range"
9v5_cine_Lcos(audio, cine)"immersive cinematic narration, compelling storytelling"
10v6_nat_Scos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
11v7_auth_Scos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
12v8_pro_Scos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
13v9_nr_Lcos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14v10_ac_Lcos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15v11_pd_Lcos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16v12_ef_Lcos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17v13_ff_Lcos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18v14_wr_Lcos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19v15_nr_Scos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20v16_ac_Scos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21v17_pd_Scos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22v18_ef_Scos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23v19_ff_Scos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24v20_wr_Scos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25v21_san_Lcos(audio, sanitized_prompt)Quoted speech removed (Large)
26v22_san_Scos(audio, sanitized_prompt)Quoted speech removed (Small)
27v23_snr_Lcos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Large)
28v24_snr_Scos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method N=5 N=10 N=25 N=50 N=100 Gain N=5→100Knee Point
Standard: (1−WER) × Content Enjoyment 3.9211 4.0197 4.0497 4.1281 4.1523 +0.2311 N=25
VoiceCLAP-Large × Quality Text 0.9132 0.9378 0.9521 0.9581 0.9605 +0.0473 N=50
VoiceCLAP-Small × Quality Text 0.7602 0.7738 0.7914 0.8037 0.8143 +0.0542 N=50
VoiceCLAP-Large × Prompt Match 1.3985 1.4249 1.4389 1.4555 1.4595 +0.0610 N=25
VoiceCLAP-Small × Prompt Match 0.8512 0.8702 0.8706 0.8918 0.9013 +0.0501 N=25
v1 Natural (Large) 0.9764 1.0031 1.0191 1.0278 1.0301 +0.0537 N=50
v2 Authentic (Large) 1.0138 1.0394 1.0529 1.0635 1.0638 +0.0500 N=25
v3 Professional (Large) 0.8590 0.8849 0.8977 0.9053 0.9067 +0.0476 N=25
v4 Expressive (Large) 1.0251 1.0429 1.0563 1.0740 1.0753 +0.0502 N=25
v5 Cinematic (Large) 0.9964 1.0109 1.0247 1.0376 1.0394 +0.0430 N=25
v6 Natural (Small) 0.7862 0.8159 0.8379 0.8518 0.8621 +0.0759 N=50
v7 Authentic (Small) 0.7493 0.7736 0.8021 0.8069 0.8180 +0.0687 N=50
v8 Professional (Small) 0.6313 0.6456 0.6537 0.6698 0.6767 +0.0454 N=25
v9 Natural−Robotic (Large) 1.7559 1.7926 1.8109 1.8245 1.8275 +0.0716 N=25
v10 Authentic−Cheap (Large) 1.8424 1.8812 1.9048 1.9199 1.9217 +0.0793 N=25
v11 Professional−Distorted (Large) 1.7442 1.7871 1.8136 1.8248 1.8271 +0.0828 N=50
v12 Expressive−Flat (Large) 1.7568 1.7898 1.8107 1.8367 1.8419 +0.0851 N=25
v13 FullPos−FullNeg (Large) 1.7593 1.7959 1.8217 1.8302 1.8325 +0.0732 N=25
v14 Warm−Robotic (Large) 1.6689 1.7077 1.7273 1.7379 1.7454 +0.0765 N=25
v15 Natural−Robotic (Small) 1.8332 1.8758 1.9142 1.9200 1.9358 +0.1026 N=50
v16 Authentic−Cheap (Small) 1.8533 1.8982 1.9543 1.9610 1.9857 +0.1324 N=50
v17 Professional−Distorted (Small) 1.7020 1.7347 1.7577 1.7740 1.7744 +0.0725 N=25
v18 Expressive−Flat (Small) 1.5764 1.6263 1.6467 1.6868 1.7039 +0.1275 N=25
v19 FullPos−FullNeg (Small) 1.6394 1.6768 1.7053 1.7219 1.7244 +0.0849 N=50
v20 Warm−Robotic (Small) 1.7189 1.7620 1.8005 1.8227 1.8289 +0.1100 N=50
v21 Sanitized Prompt (Large) 1.1563 1.1783 1.1981 1.2061 1.2133 +0.0571 N=50
v22 Sanitized Prompt (Small) 0.8343 0.8474 0.8450 0.8684 0.8791 +0.0448 N=25
v23 Sanitized−Uncanny (Large) 1.9570 1.9875 2.0213 2.0312 2.0425 +0.0856 N=50
v24 Sanitized−Uncanny (Small) 1.7907 1.8339 1.8565 1.9014 1.9077 +0.1170 N=25

Diminishing Returns — All Methods Overlaid

Original (5) 0.684 1.419 2.154 2.890 3.625 4.360 N=5 N=10 N=25 N=50 N=100 N candidates Standard: (1−WER) × Content Enjoyment VoiceCLAP-Large × Quality Text VoiceCLAP-Small × Quality Text VoiceCLAP-Large × Prompt Match VoiceCLAP-Small × Prompt Match Positive-only, Large (5) 0.773 0.844 0.915 0.987 1.058 1.129 N=5 N=10 N=25 N=50 N=100 N candidates v1 Natural (Large) v2 Authentic (Large) v3 Professional (Large) v4 Expressive (Large) v5 Cinematic (Large) Positive-only, Small (3) 0.568 0.636 0.703 0.770 0.838 0.905 N=5 N=10 N=25 N=50 N=100 N candidates v6 Natural (Small) v7 Authentic (Small) v8 Professional (Small) Pos−Neg, Large (6) 1.502 1.605 1.708 1.811 1.915 2.018 N=5 N=10 N=25 N=50 N=100 N candidates v9 Natural−Robotic (Large) v10 Authentic−Cheap (Large) v11 Professional−Distorted (Large) v12 Expressive−Flat (Large) v13 FullPos−FullNeg (Large) v14 Warm−Robotic (Large) Pos−Neg, Small (6) 1.419 1.552 1.685 1.819 1.952 2.085 N=5 N=10 N=25 N=50 N=100 N candidates v15 Natural−Robotic (Small) v16 Authentic−Cheap (Small) v17 Professional−Distorted (Small) v18 Expressive−Flat (Small) v19 FullPos−FullNeg (Small) v20 Warm−Robotic (Small) Sanitized Prompt (4) 0.751 1.030 1.308 1.587 1.866 2.145 N=5 N=10 N=25 N=50 N=100 N candidates v21 Sanitized Prompt (Large) v22 Sanitized Prompt (Small) v23 Sanitized−Uncanny (Large) v24 Sanitized−Uncanny (Small)

Marginal Improvement per Additional Candidate

MethodN=5→10N=10→25N=25→50N=50→100
Standard: (1−WER) × Content Enjoyment 0.01971/cand (2.5%) 0.00200/cand (0.7%) 0.00313/cand (1.9%) 0.00048/cand (0.6%)
VoiceCLAP-Large × Quality Text 0.00492/cand (2.7%) 0.00095/cand (1.5%) 0.00024/cand (0.6%) 0.00005/cand (0.2%)
VoiceCLAP-Small × Quality Text 0.00272/cand (1.8%) 0.00118/cand (2.3%) 0.00049/cand (1.6%) 0.00021/cand (1.3%)
VoiceCLAP-Large × Prompt Match 0.00529/cand (1.9%) 0.00093/cand (1.0%) 0.00067/cand (1.2%) 0.00008/cand (0.3%)
VoiceCLAP-Small × Prompt Match 0.00381/cand (2.2%) 0.00003/cand (0.0%) 0.00085/cand (2.4%) 0.00019/cand (1.1%)
v1 Natural (Large) 0.00534/cand (2.7%) 0.00106/cand (1.6%) 0.00035/cand (0.9%) 0.00005/cand (0.2%)
v2 Authentic (Large) 0.00511/cand (2.5%) 0.00090/cand (1.3%) 0.00042/cand (1.0%) 0.00001/cand (0.0%)
v3 Professional (Large) 0.00518/cand (3.0%) 0.00085/cand (1.4%) 0.00030/cand (0.8%) 0.00003/cand (0.2%)
v4 Expressive (Large) 0.00356/cand (1.7%) 0.00090/cand (1.3%) 0.00071/cand (1.7%) 0.00003/cand (0.1%)
v5 Cinematic (Large) 0.00289/cand (1.4%) 0.00092/cand (1.4%) 0.00051/cand (1.3%) 0.00004/cand (0.2%)
v6 Natural (Small) 0.00594/cand (3.8%) 0.00147/cand (2.7%) 0.00056/cand (1.7%) 0.00021/cand (1.2%)
v7 Authentic (Small) 0.00486/cand (3.2%) 0.00190/cand (3.7%) 0.00019/cand (0.6%) 0.00022/cand (1.4%)
v8 Professional (Small) 0.00285/cand (2.3%) 0.00054/cand (1.3%) 0.00064/cand (2.5%) 0.00014/cand (1.0%)
v9 Natural−Robotic (Large) 0.00735/cand (2.1%) 0.00122/cand (1.0%) 0.00054/cand (0.7%) 0.00006/cand (0.2%)
v10 Authentic−Cheap (Large) 0.00774/cand (2.1%) 0.00158/cand (1.3%) 0.00060/cand (0.8%) 0.00004/cand (0.1%)
v11 Professional−Distorted (Large) 0.00857/cand (2.5%) 0.00176/cand (1.5%) 0.00045/cand (0.6%) 0.00005/cand (0.1%)
v12 Expressive−Flat (Large) 0.00661/cand (1.9%) 0.00139/cand (1.2%) 0.00104/cand (1.4%) 0.00010/cand (0.3%)
v13 FullPos−FullNeg (Large) 0.00733/cand (2.1%) 0.00172/cand (1.4%) 0.00034/cand (0.5%) 0.00005/cand (0.1%)
v14 Warm−Robotic (Large) 0.00775/cand (2.3%) 0.00131/cand (1.2%) 0.00042/cand (0.6%) 0.00015/cand (0.4%)
v15 Natural−Robotic (Small) 0.00851/cand (2.3%) 0.00256/cand (2.1%) 0.00023/cand (0.3%) 0.00032/cand (0.8%)
v16 Authentic−Cheap (Small) 0.00897/cand (2.4%) 0.00375/cand (3.0%) 0.00027/cand (0.3%) 0.00049/cand (1.3%)
v17 Professional−Distorted (Small) 0.00655/cand (1.9%) 0.00154/cand (1.3%) 0.00065/cand (0.9%) 0.00001/cand (0.0%)
v18 Expressive−Flat (Small) 0.00998/cand (3.2%) 0.00136/cand (1.3%) 0.00160/cand (2.4%) 0.00034/cand (1.0%)
v19 FullPos−FullNeg (Small) 0.00748/cand (2.3%) 0.00190/cand (1.7%) 0.00066/cand (1.0%) 0.00005/cand (0.1%)
v20 Warm−Robotic (Small) 0.00863/cand (2.5%) 0.00256/cand (2.2%) 0.00089/cand (1.2%) 0.00012/cand (0.3%)
v21 Sanitized Prompt (Large) 0.00441/cand (1.9%) 0.00132/cand (1.7%) 0.00032/cand (0.7%) 0.00015/cand (0.6%)
v22 Sanitized Prompt (Small) 0.00263/cand (1.6%) -0.00016/cand (-0.3%) 0.00094/cand (2.8%) 0.00021/cand (1.2%)
v23 Sanitized−Uncanny (Large) 0.00611/cand (1.6%) 0.00225/cand (1.7%) 0.00040/cand (0.5%) 0.00023/cand (0.6%)
v24 Sanitized−Uncanny (Small) 0.00864/cand (2.4%) 0.00151/cand (1.2%) 0.00180/cand (2.4%) 0.00013/cand (0.3%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking MethodWithout Suffix (N=10)With Suffix (N=10)Delta
MeanBestMedianMeanBestMedianΔ MeanΔ Best
Standard: (1−WER) × Content Enjoyment3.65674.01453.67653.64934.01683.6526-0.0074+0.0023
VoiceCLAP-Large × Quality Text0.85670.93280.86290.75880.82050.7635-0.0980-0.1123
VoiceCLAP-Small × Quality Text0.67470.78320.68390.75880.82050.7635+0.0840+0.0374
VoiceCLAP-Large × Prompt Match1.30571.43021.31550.75880.82050.7635-0.5469-0.6096
VoiceCLAP-Small × Prompt Match0.76210.86960.75910.75880.82050.7635-0.0033-0.0490
v1 Natural (Large)0.91531.00210.92270.75880.82050.7635-0.1565-0.1816
v2 Authentic (Large)0.95061.04050.95800.75880.82050.7635-0.1918-0.2199
v3 Professional (Large)0.80640.87800.81190.75880.82050.7635-0.0477-0.0575
v4 Expressive (Large)0.94751.04080.95380.75880.82050.7635-0.1887-0.2203
v5 Cinematic (Large)0.92111.00980.92520.75880.82050.7635-0.1623-0.1893
v6 Natural (Small)0.72350.82750.72800.75880.82050.7635+0.0352-0.0069
v7 Authentic (Small)0.68860.77830.69500.75880.82050.7635+0.0701+0.0422
v8 Professional (Small)0.56350.64680.57110.75880.82050.7635+0.1952+0.1737
v9 Natural−Robotic (Large)1.64891.80541.65721.51751.64101.5270-0.1314-0.1644
v10 Authentic−Cheap (Large)1.72971.88941.74011.51751.64101.5270-0.2122-0.2484
v11 Professional−Distorted (Large)1.63831.78351.64661.51751.64101.5270-0.1208-0.1425
v12 Expressive−Flat (Large)1.63511.79351.64231.51751.64101.5270-0.1175-0.1525
v13 FullPos−FullNeg (Large)1.65361.80301.66311.51751.64101.5270-0.1361-0.1620
v14 Warm−Robotic (Large)1.57211.71301.57731.51751.64101.5270-0.0546-0.0720
v15 Natural−Robotic (Small)1.71341.88161.72271.51751.64101.5270-0.1959-0.2406
v16 Authentic−Cheap (Small)1.71841.90781.73421.51751.64101.5270-0.2009-0.2668
v17 Professional−Distorted (Small)1.58271.74311.58781.51751.64101.5270-0.0652-0.1021
v18 Expressive−Flat (Small)1.45981.63501.45901.51751.64101.5270+0.0577+0.0060
v19 FullPos−FullNeg (Small)1.53501.68291.54391.51751.64101.5270-0.0175-0.0419
v20 Warm−Robotic (Small)1.59761.77021.59251.51751.64101.5270-0.0801-0.1292
v21 Sanitized Prompt (Large)1.07891.18561.08660.75880.82050.7635-0.3201-0.3651
v22 Sanitized Prompt (Small)0.73560.83700.73420.75880.82050.7635+0.0231-0.0165
v23 Sanitized−Uncanny (Large)1.83372.01401.84561.51751.64101.5270-0.3162-0.3730
v24 Sanitized−Uncanny (Small)1.64891.84441.64931.51751.64101.5270-0.1314-0.2034

Per-Prompt Ablation: Standard Reward (N=10)

#LangNo Suffix MeanNo Suffix BestWith Suffix MeanWith Suffix BestΔ MeanΔ Best
0English4.41274.63374.30774.8373-0.1050+0.2036
1French4.76464.94664.73314.9823-0.0316+0.0357
2English1.11741.27101.18041.2948+0.0630+0.0238
3German4.79724.89714.78274.8982-0.0145+0.0011
4French4.53824.75374.56004.8114+0.0218+0.0577
5French3.52844.17253.77674.4835+0.2483+0.3110
6English3.03693.75043.14013.6910+0.1032-0.0594
7German2.22242.88682.12532.5059-0.0972-0.3809
8Spanish4.77325.07894.72474.8181-0.0485-0.2608
9French3.37583.75413.16283.8451-0.2130+0.0910

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 3.9211 4.0197 4.0497 4.1281 4.1523
Std Dev 1.2129 1.2510 1.2175 1.2375 1.2120
Avg Mean 3.6636 3.6884 3.6003 3.6224 3.6181

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 4.7776 4.7938 4.8550 4.8415 4.8550
1French 4.8704 5.0228 5.0776 5.0776 5.0776
2English 1.2889 1.3107 1.3004 1.2832 1.3107
3German 4.8286 4.9408 4.9466 4.9466 4.9687
4French 4.6758 4.7537 4.7341 4.7982 4.7982
5French 4.0560 4.3700 4.3700 4.6713 4.6713
6English 3.8028 3.7903 3.7252 3.9250 3.9250
7German 2.4948 2.4318 2.7525 2.7623 2.9413
8Spanish 4.9337 5.0161 4.9645 5.0789 5.0789
9French 3.4829 3.7670 3.7712 3.8962 3.8962

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9132 0.9378 0.9521 0.9581 0.9605
Std Dev 0.2877 0.2927 0.2786 0.2761 0.2770
Avg Mean 0.8536 0.8576 0.8394 0.8467 0.8453

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1582 1.2039 1.2094 1.2039 1.2094
1French 1.1698 1.1680 1.1825 1.1825 1.1825
2English 0.3376 0.3430 0.3653 0.3653 0.3653
3German 1.1379 1.1392 1.1497 1.1544 1.1544
4French 1.2514 1.2718 1.2643 1.2616 1.2718
5French 0.8476 0.9030 0.9030 0.9202 0.9202
6English 0.8922 0.9158 0.9198 0.9198 0.9198
7German 0.5721 0.5668 0.6462 0.6604 0.6685
8Spanish 0.8912 0.9030 0.9193 0.9251 0.9251
9French 0.8738 0.9634 0.9613 0.9876 0.9876

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7602 0.7738 0.7914 0.8037 0.8143
Std Dev 0.2858 0.2947 0.2852 0.2913 0.2882
Avg Mean 0.6834 0.6801 0.6552 0.6653 0.6646

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.7934 0.8073 0.8373 0.8412 0.8569
1French 1.0337 1.0299 1.0184 1.0643 1.0643
2English 0.2656 0.2653 0.2673 0.2660 0.2673
3German 1.0827 1.0635 1.0875 1.1005 1.1113
4French 1.1299 1.1572 1.1713 1.1713 1.1713
5French 0.5162 0.5541 0.6197 0.6036 0.6197
6English 0.7566 0.7612 0.7731 0.7731 0.8069
7German 0.4837 0.4344 0.5363 0.5298 0.5582
8Spanish 0.6328 0.6664 0.6059 0.6664 0.6664
9French 0.9072 0.9983 0.9973 1.0211 1.0211

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.3985 1.4249 1.4389 1.4555 1.4595
Std Dev 0.4499 0.4530 0.4418 0.4390 0.4376
Avg Mean 1.3025 1.3116 1.2775 1.2882 1.2877

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.7348 1.7763 1.7653 1.7775 1.7775
1French 1.8340 1.8494 1.8364 1.8560 1.8560
2English 0.4611 0.4625 0.4611 0.4655 0.4659
3German 1.7875 1.8278 1.8309 1.8294 1.8347
4French 1.5893 1.5869 1.6135 1.6175 1.6175
5French 1.5492 1.6133 1.6329 1.6618 1.6618
6English 1.3255 1.2953 1.3021 1.3255 1.3512
7German 0.8214 0.8595 0.9457 0.9935 1.0018
8Spanish 1.6502 1.6439 1.6496 1.6631 1.6631
9French 1.2317 1.3342 1.3515 1.3655 1.3655

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8512 0.8702 0.8706 0.8918 0.9013
Std Dev 0.2877 0.2879 0.2868 0.2893 0.2847
Avg Mean 0.7663 0.7682 0.7461 0.7538 0.7526

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.3195 1.3274 1.3233 1.3367 1.3367
1French 0.8898 0.9041 0.9000 0.9000 0.9085
2English 0.3192 0.2841 0.3029 0.3023 0.3192
3German 1.1300 1.1490 1.1623 1.1907 1.1907
4French 0.8509 0.8983 0.8778 0.9097 0.9097
5French 0.8920 1.0036 1.0036 0.9609 1.0036
6English 1.0857 0.9828 1.0145 1.0892 1.0892
7German 0.5828 0.6266 0.6308 0.6702 0.6974
8Spanish 0.7441 0.7398 0.7505 0.7717 0.7717
9French 0.6980 0.7867 0.7406 0.7867 0.7867

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9764 1.0031 1.0191 1.0278 1.0301
Std Dev 0.2900 0.2936 0.2862 0.2853 0.2837
Avg Mean 0.9106 0.9142 0.8959 0.9024 0.9014

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2753 1.3070 1.3020 1.3155 1.3155
1French 1.2256 1.2280 1.2585 1.2585 1.2585
2English 0.3817 0.3846 0.3986 0.3986 0.3986
3German 1.1115 1.1171 1.1289 1.1289 1.1289
4French 1.3096 1.3124 1.3213 1.3162 1.3213
5French 0.9475 1.0200 1.0200 1.0337 1.0337
6English 0.9296 0.9727 0.9906 0.9906 0.9906
7German 0.6304 0.6356 0.6936 0.7113 0.7295
8Spanish 0.9762 0.9799 0.9951 1.0209 1.0209
9French 0.9769 1.0742 1.0820 1.1036 1.1036

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0138 1.0394 1.0529 1.0635 1.0638
Std Dev 0.2793 0.2889 0.2741 0.2746 0.2738
Avg Mean 0.9458 0.9512 0.9299 0.9381 0.9367

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1419 1.1758 1.1785 1.1811 1.1811
1French 1.2850 1.2929 1.2979 1.2979 1.2979
2English 0.4014 0.3923 0.4102 0.4096 0.4129
3German 1.1828 1.2004 1.1983 1.2011 1.2011
4French 1.2851 1.3088 1.3066 1.3084 1.3088
5French 1.0603 1.1132 1.1132 1.1470 1.1470
6English 0.9537 0.9756 0.9730 0.9902 0.9902
7German 0.6878 0.6898 0.7822 0.7958 0.7958
8Spanish 1.1612 1.1747 1.1853 1.1952 1.1952
9French 0.9792 1.0704 1.0842 1.1085 1.1085

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8590 0.8849 0.8977 0.9053 0.9067
Std Dev 0.2640 0.2688 0.2526 0.2513 0.2522
Avg Mean 0.8024 0.8067 0.7904 0.7970 0.7955

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0653 1.1191 1.1217 1.1217 1.1217
1French 1.0959 1.1022 1.1109 1.1109 1.1109
2English 0.3224 0.3292 0.3578 0.3578 0.3578
3German 1.0931 1.0896 1.0977 1.1065 1.1065
4French 1.1510 1.1686 1.1652 1.1605 1.1686
5French 0.7823 0.8424 0.8424 0.8601 0.8601
6English 0.8638 0.8963 0.8904 0.8904 0.8963
7German 0.5541 0.5473 0.6301 0.6391 0.6391
8Spanish 0.8439 0.8552 0.8648 0.8825 0.8825
9French 0.8183 0.8992 0.8958 0.9231 0.9231

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0251 1.0429 1.0563 1.0740 1.0753
Std Dev 0.2986 0.2884 0.2893 0.2944 0.2924
Avg Mean 0.9443 0.9516 0.9309 0.9360 0.9348

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2831 1.2912 1.2918 1.2918 1.2947
1French 1.2773 1.2643 1.3222 1.3222 1.3222
2English 0.3935 0.4168 0.4045 0.4091 0.4168
3German 1.0329 1.0660 1.0660 1.0745 1.0749
4French 1.2343 1.2362 1.2646 1.2948 1.2948
5French 1.1254 1.1557 1.1559 1.2255 1.2255
6English 1.1776 1.1656 1.1440 1.1776 1.1776
7German 0.6135 0.6409 0.7016 0.7079 0.7101
8Spanish 1.1684 1.1759 1.1781 1.1833 1.1833
9French 0.9445 1.0161 1.0347 1.0530 1.0530

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9964 1.0109 1.0247 1.0376 1.0394
Std Dev 0.2724 0.2761 0.2649 0.2702 0.2691
Avg Mean 0.9184 0.9236 0.9017 0.9087 0.9075

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1097 1.1111 1.1278 1.1278 1.1313
1French 1.2325 1.2483 1.2467 1.2661 1.2661
2English 0.3990 0.3969 0.3965 0.4087 0.4087
3German 1.0830 1.1234 1.1234 1.1196 1.1234
4French 1.2276 1.2386 1.2463 1.2566 1.2566
5French 1.1481 1.1758 1.1852 1.2318 1.2318
6English 1.1073 1.0681 1.0774 1.1073 1.1073
7German 0.6179 0.6321 0.7295 0.7186 0.7295
8Spanish 1.0503 1.0536 1.0467 1.0595 1.0595
9French 0.9890 1.0608 1.0677 1.0798 1.0798

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7862 0.8159 0.8379 0.8518 0.8621
Std Dev 0.2482 0.2556 0.2474 0.2531 0.2501
Avg Mean 0.7243 0.7259 0.7031 0.7109 0.7100

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.6108 0.6347 0.6967 0.6987 0.7277
1French 1.0956 1.0962 1.1180 1.1404 1.1404
2English 0.3171 0.3042 0.3166 0.3222 0.3222
3German 1.0255 1.0044 1.0577 1.0503 1.0577
4French 1.0307 1.0307 1.0817 1.0824 1.0824
5French 0.6157 0.7201 0.7186 0.7417 0.7417
6English 0.7917 0.8660 0.8145 0.8410 0.8845
7German 0.5967 0.5642 0.6810 0.6609 0.6839
8Spanish 0.8426 0.8955 0.8782 0.9215 0.9215
9French 0.9360 1.0432 1.0160 1.0593 1.0593

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7493 0.7736 0.8021 0.8069 0.8180
Std Dev 0.2233 0.2350 0.2309 0.2282 0.2268
Avg Mean 0.6895 0.6906 0.6677 0.6769 0.6759

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.6287 0.6399 0.7065 0.7135 0.7255
1French 1.0487 1.0506 1.0779 1.0779 1.0779
2English 0.3194 0.3202 0.3207 0.3268 0.3268
3German 0.9714 0.9818 1.0153 0.9954 1.0153
4French 0.9504 0.9694 0.9932 1.0018 1.0018
5French 0.6871 0.7827 0.7645 0.8041 0.8041
6English 0.6475 0.7115 0.7146 0.7146 0.7555
7German 0.5543 0.4889 0.5921 0.5906 0.6122
8Spanish 0.8518 0.8811 0.8979 0.9061 0.9061
9French 0.8337 0.9098 0.9381 0.9381 0.9549

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6313 0.6456 0.6537 0.6698 0.6767
Std Dev 0.2373 0.2383 0.2276 0.2291 0.2330
Avg Mean 0.5696 0.5666 0.5493 0.5564 0.5563

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.8213 0.8085 0.8463 0.8506 0.8596
1French 0.8086 0.8257 0.8056 0.8390 0.8402
2English 0.2282 0.2307 0.2376 0.2381 0.2394
3German 0.8733 0.8627 0.8786 0.8831 0.8922
4French 0.9374 0.9644 0.9253 0.9374 0.9644
5French 0.3853 0.4351 0.4726 0.4569 0.4726
6English 0.6150 0.6312 0.6484 0.6590 0.6590
7German 0.4163 0.4004 0.4614 0.5038 0.5038
8Spanish 0.5215 0.5338 0.4931 0.5338 0.5338
9French 0.7063 0.7634 0.7680 0.7963 0.8020

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7559 1.7926 1.8109 1.8245 1.8275
Std Dev 0.5045 0.5081 0.4897 0.4920 0.4902
Avg Mean 1.6442 1.6539 1.6134 1.6264 1.6253

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1736 2.1736 2.1747 2.1848 2.1848
1French 2.1287 2.1429 2.1472 2.1500 2.1505
2English 0.6030 0.6056 0.6116 0.6123 0.6123
3German 2.0797 2.0888 2.0877 2.0945 2.0945
4French 2.2030 2.2117 2.2084 2.2203 2.2203
5French 1.8193 1.9156 1.9156 1.9413 1.9413
6English 1.6918 1.7193 1.7391 1.7391 1.7527
7German 1.2064 1.2208 1.3700 1.3945 1.4109
8Spanish 1.9343 1.9471 1.9591 1.9998 1.9998
9French 1.7191 1.9010 1.8959 1.9081 1.9081

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8424 1.8812 1.9048 1.9199 1.9217
Std Dev 0.5175 0.5344 0.5107 0.5145 0.5129
Avg Mean 1.7253 1.7352 1.6914 1.7073 1.7052

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9587 1.9875 1.9724 1.9875 1.9875
1French 2.3190 2.3273 2.3388 2.3388 2.3388
2English 0.6691 0.6567 0.6737 0.6737 0.6778
3German 2.1445 2.1866 2.1777 2.1866 2.1866
4French 2.3130 2.3429 2.3539 2.3539 2.3539
5French 2.0317 2.1040 2.1151 2.1764 2.1764
6English 1.7215 1.7313 1.7306 1.7548 1.7686
7German 1.2738 1.2732 1.4685 1.4767 1.4767
8Spanish 2.1688 2.1897 2.1943 2.2145 2.2145
9French 1.8242 2.0123 2.0234 2.0364 2.0364

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7442 1.7871 1.8136 1.8248 1.8271
Std Dev 0.5054 0.5156 0.4874 0.4867 0.4879
Avg Mean 1.6298 1.6395 1.6025 1.6164 1.6138

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1109 2.1886 2.2134 2.2134 2.2134
1French 2.2090 2.2086 2.2248 2.2248 2.2248
2English 0.6577 0.6694 0.6978 0.6978 0.6978
3German 2.0262 2.0254 2.0450 2.0467 2.0467
4French 2.2768 2.3032 2.2888 2.2982 2.3085
5French 1.7806 1.8894 1.8894 1.9319 1.9319
6English 1.7203 1.7584 1.7515 1.7588 1.7588
7German 1.1253 1.1100 1.2946 1.3197 1.3197
8Spanish 1.8646 1.8799 1.8694 1.8951 1.8951
9French 1.6710 1.8382 1.8610 1.8617 1.8740

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7568 1.7898 1.8107 1.8367 1.8419
Std Dev 0.5213 0.5045 0.5028 0.5073 0.5068
Avg Mean 1.6297 1.6452 1.6049 1.6148 1.6134

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.2028 2.2028 2.1993 2.2028 2.2028
1French 2.1866 2.1627 2.2576 2.2576 2.2576
2English 0.5835 0.6100 0.6079 0.6168 0.6168
3German 1.8672 1.9043 1.9078 1.9181 1.9483
4French 2.0900 2.1050 2.1191 2.1535 2.1535
5French 1.9551 2.0159 2.0159 2.1084 2.1084
6English 1.9193 1.9044 1.8824 1.9230 1.9333
7German 1.1260 1.1998 1.2958 1.3208 1.3324
8Spanish 2.0324 2.0405 2.0544 2.0812 2.0812
9French 1.6048 1.7527 1.7668 1.7846 1.7846

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7593 1.7959 1.8217 1.8302 1.8325
Std Dev 0.5255 0.5299 0.5103 0.5108 0.5090
Avg Mean 1.6470 1.6573 1.6184 1.6314 1.6297

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1385 2.1740 2.1722 2.1798 2.1798
1French 2.2920 2.2888 2.3142 2.3142 2.3142
2English 0.6112 0.6215 0.6366 0.6366 0.6366
3German 2.0361 2.0500 2.0607 2.0607 2.0607
4French 2.2520 2.2593 2.2678 2.2660 2.2678
5French 1.8327 1.9253 1.9253 1.9678 1.9678
6English 1.7101 1.7303 1.7422 1.7427 1.7427
7German 1.1290 1.1300 1.2866 1.2994 1.3176
8Spanish 1.8750 1.8928 1.9125 1.9359 1.9359
9French 1.7162 1.8875 1.8988 1.8988 1.9018

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6689 1.7077 1.7273 1.7379 1.7454
Std Dev 0.5072 0.5098 0.4944 0.4973 0.4949
Avg Mean 1.5668 1.5781 1.5391 1.5523 1.5510

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0806 2.1079 2.1412 2.1218 2.1412
1French 2.0684 2.0769 2.0838 2.0832 2.0976
2English 0.5469 0.5528 0.5596 0.5528 0.5596
3German 2.0888 2.1274 2.1112 2.1265 2.1274
4French 2.1134 2.1134 2.1140 2.1282 2.1282
5French 1.6897 1.7711 1.7711 1.8079 1.8079
6English 1.6168 1.6501 1.6694 1.6694 1.6694
7German 1.1035 1.1190 1.2542 1.2641 1.2978
8Spanish 1.8116 1.8217 1.8366 1.8682 1.8682
9French 1.5695 1.7362 1.7319 1.7569 1.7569

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8332 1.8758 1.9142 1.9200 1.9358
Std Dev 0.5389 0.5460 0.5315 0.5315 0.5339
Avg Mean 1.7050 1.7186 1.6716 1.6867 1.6849

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8996 1.9782 2.0171 2.0119 2.0586
1French 2.4417 2.4035 2.4509 2.4509 2.4509
2English 0.6400 0.6458 0.6537 0.6537 0.6537
3German 2.2766 2.2829 2.2951 2.3095 2.3099
4French 2.3222 2.3386 2.4143 2.3708 2.4143
5French 1.8268 1.9063 1.9629 2.0193 2.0193
6English 1.8211 1.8898 1.8184 1.8459 1.8898
7German 1.2413 1.2137 1.4525 1.4290 1.4525
8Spanish 2.0023 2.0249 1.9953 2.0268 2.0268
9French 1.8603 2.0738 2.0820 2.0820 2.0820

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8533 1.8982 1.9543 1.9610 1.9857
Std Dev 0.5464 0.5686 0.5648 0.5588 0.5600
Avg Mean 1.7071 1.7279 1.6767 1.6943 1.6907

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.7692 1.7692 1.8370 1.8730 1.9137
1French 2.5375 2.4476 2.5375 2.5488 2.5488
2English 0.6677 0.6403 0.6534 0.6537 0.6677
3German 2.3305 2.3969 2.4266 2.4390 2.4390
4French 2.1874 2.2907 2.3943 2.3109 2.3943
5French 2.0455 2.1373 2.1570 2.2059 2.2059
6English 1.6615 1.7937 1.7499 1.7499 1.8007
7German 1.3056 1.2638 1.4821 1.5207 1.5207
8Spanish 2.1587 2.2565 2.2621 2.2649 2.2649
9French 1.8695 1.9855 2.0435 2.0435 2.1015

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7020 1.7347 1.7577 1.7740 1.7744
Std Dev 0.5263 0.5350 0.5146 0.5173 0.5176
Avg Mean 1.5803 1.5863 1.5439 1.5592 1.5577

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0713 2.1246 2.1426 2.1426 2.1426
1French 2.1997 2.1827 2.1551 2.1993 2.1998
2English 0.5479 0.5369 0.5557 0.5586 0.5586
3German 2.0820 2.0620 2.0820 2.0955 2.0992
4French 2.2254 2.2799 2.2824 2.2824 2.2824
5French 1.6914 1.7538 1.8142 1.8170 1.8170
6English 1.6814 1.6787 1.6857 1.7444 1.7444
7German 1.1067 1.1287 1.2695 1.2812 1.2812
8Spanish 1.7463 1.7479 1.7576 1.7615 1.7615
9French 1.6674 1.8516 1.8325 1.8574 1.8574

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5764 1.6263 1.6467 1.6868 1.7039
Std Dev 0.4879 0.4754 0.4767 0.4709 0.4733
Avg Mean 1.4591 1.4602 1.4323 1.4394 1.4391

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0212 2.0794 2.0556 2.0456 2.0794
1French 1.8202 1.7948 1.9501 1.9501 1.9501
2English 0.5053 0.5463 0.5266 0.5463 0.5466
3German 1.7204 1.8126 1.8176 1.8234 1.8234
4French 1.8198 1.8198 1.9155 1.9016 1.9155
5French 1.6916 1.7813 1.7856 1.9195 1.9195
6English 1.6406 1.6957 1.6320 1.7379 1.7416
7German 0.9830 1.0688 1.1472 1.2472 1.2472
8Spanish 2.1011 2.0943 2.0439 2.1034 2.1034
9French 1.4611 1.5701 1.5930 1.5930 1.7122

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6394 1.6768 1.7053 1.7219 1.7244
Std Dev 0.4773 0.4861 0.4686 0.4662 0.4669
Avg Mean 1.5296 1.5378 1.4985 1.5121 1.5109

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8933 1.9317 1.9910 1.9910 1.9910
1French 2.1025 2.1021 2.0847 2.1204 2.1204
2English 0.5753 0.5721 0.5782 0.5926 0.5926
3German 2.0274 1.9954 2.0274 2.0190 2.0274
4French 2.0485 2.1151 2.0990 2.1112 2.1151
5French 1.6047 1.6975 1.7564 1.7714 1.7714
6English 1.5931 1.6588 1.6046 1.6462 1.6588
7German 1.1161 1.0997 1.2818 1.3035 1.3035
8Spanish 1.8430 1.8430 1.8656 1.8656 1.8656
9French 1.5902 1.7528 1.7639 1.7977 1.7977

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7189 1.7620 1.8005 1.8227 1.8289
Std Dev 0.5457 0.5518 0.5512 0.5440 0.5498
Avg Mean 1.5933 1.6016 1.5596 1.5732 1.5714

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0912 2.1808 2.2270 2.2070 2.2270
1French 2.2332 2.1770 2.2332 2.2332 2.2332
2English 0.5420 0.5532 0.5588 0.5659 0.5659
3German 2.1429 2.1766 2.1837 2.2188 2.2188
4French 2.2308 2.2308 2.3380 2.2960 2.3380
5French 1.7326 1.8033 1.8033 1.8954 1.8954
6English 1.6905 1.7608 1.7135 1.7750 1.7750
7German 1.0509 1.0436 1.2098 1.2258 1.2258
8Spanish 1.7668 1.7853 1.7630 1.8356 1.8356
9French 1.7081 1.9089 1.9744 1.9744 1.9744

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.1563 1.1783 1.1981 1.2061 1.2133
Std Dev 0.3575 0.3593 0.3532 0.3492 0.3466
Avg Mean 1.0756 1.0813 1.0557 1.0632 1.0623

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.5672 1.5740 1.5768 1.5808 1.5879
1French 1.4746 1.4709 1.5038 1.5038 1.5038
2English 0.4306 0.4283 0.4310 0.4330 0.4458
3German 1.3502 1.3902 1.3987 1.3817 1.3987
4French 1.2544 1.2620 1.2808 1.2993 1.2993
5French 1.3296 1.3911 1.4057 1.4213 1.4213
6English 1.2152 1.2188 1.2114 1.2237 1.2482
7German 0.6519 0.6742 0.7445 0.7784 0.7897
8Spanish 1.2399 1.2323 1.2370 1.2475 1.2475
9French 1.0491 1.1412 1.1912 1.1912 1.1912

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8343 0.8474 0.8450 0.8684 0.8791
Std Dev 0.2867 0.2840 0.2880 0.2861 0.2842
Avg Mean 0.7475 0.7419 0.7211 0.7283 0.7283

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.3355 1.3466 1.3529 1.3483 1.3529
1French 0.8675 0.8810 0.8792 0.8801 0.8852
2English 0.3241 0.2935 0.3057 0.3064 0.3241
3German 1.0421 1.0477 1.0710 1.0743 1.1084
4French 0.7756 0.7756 0.7853 0.8179 0.8179
5French 0.8878 0.9857 0.9857 0.9709 0.9857
6English 1.1032 1.0235 1.0265 1.1181 1.1181
7German 0.5729 0.6159 0.6138 0.6556 0.6856
8Spanish 0.7867 0.7790 0.7439 0.7867 0.7878
9French 0.6474 0.7256 0.6859 0.7256 0.7256

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.9570 1.9875 2.0213 2.0312 2.0425
Std Dev 0.5924 0.5927 0.5796 0.5772 0.5721
Avg Mean 1.8276 1.8389 1.7934 1.8075 1.8061

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.5489 2.5576 2.5751 2.5751 2.5751
1French 2.4663 2.4520 2.4953 2.4953 2.4953
2English 0.6718 0.6710 0.6773 0.6779 0.6916
3German 2.3168 2.3578 2.3692 2.3506 2.3692
4French 2.2387 2.2399 2.2572 2.2780 2.2780
5French 2.1723 2.2625 2.2870 2.3178 2.3178
6English 1.9706 1.9576 1.9602 1.9802 2.0415
7German 1.1984 1.2290 1.3814 1.4138 1.4337
8Spanish 2.1705 2.1750 2.1815 2.1948 2.1948
9French 1.8154 1.9727 2.0283 2.0283 2.0283

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7907 1.8339 1.8565 1.9014 1.9077
Std Dev 0.5481 0.5641 0.5515 0.5483 0.5450
Avg Mean 1.6408 1.6505 1.6121 1.6259 1.6234

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.5327 2.5327 2.5217 2.5356 2.5356
1French 2.1469 2.1527 2.1980 2.1980 2.1980
2English 0.6129 0.5430 0.5813 0.5936 0.6129
3German 2.2187 2.2424 2.2300 2.3312 2.3312
4French 1.9258 2.0512 2.1278 2.0837 2.1278
5French 1.9517 2.1434 2.1434 2.1468 2.1468
6English 1.9735 1.9077 1.8959 2.0760 2.0760
7German 1.2437 1.3207 1.3930 1.5221 1.5221
8Spanish 1.7914 1.7675 1.8137 1.8490 1.8490
9French 1.5095 1.6777 1.6600 1.6777 1.6777

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100
DramaBox Prompt
A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100
DramaBox Prompt
High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.