Scenema Audio — Best-of-N: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#KeyScore FormulaText Prompt(s)
0standardContent Enjoyment
1clap_lqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2clap_sqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3clap_lpcos(audio, prompt)Original DramaBox prompt
4clap_spcos(audio, prompt)Original DramaBox prompt
5v1_nat_Lcos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
6v2_auth_Lcos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
7v3_pro_Lcos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
8v4_expr_Lcos(audio, expr)"expressive, dynamic voice acting with rich emotional range"
9v5_cine_Lcos(audio, cine)"immersive cinematic narration, compelling storytelling"
10v6_nat_Scos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
11v7_auth_Scos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
12v8_pro_Scos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
13v9_nr_Lcos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14v10_ac_Lcos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15v11_pd_Lcos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16v12_ef_Lcos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17v13_ff_Lcos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18v14_wr_Lcos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19v15_nr_Scos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20v16_ac_Scos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21v17_pd_Scos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22v18_ef_Scos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23v19_ff_Scos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24v20_wr_Scos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25v21_san_Lcos(audio, sanitized_prompt)Quoted speech removed (Large)
26v22_san_Scos(audio, sanitized_prompt)Quoted speech removed (Small)
27v23_snr_Lcos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Large)
28v24_snr_Scos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method N=5 N=10 N=25 N=50 N=100 Gain N=5→100Knee Point
Standard: (1−WER) × Content Enjoyment 3.6402 4.0591 4.2778 4.3003 4.3509 +0.7107 N=50
VoiceCLAP-Large × Quality Text 0.8578 0.9465 0.9788 0.9977 1.0018 +0.1440 N=50
VoiceCLAP-Small × Quality Text 0.6678 0.7458 0.7714 0.7913 0.7979 +0.1302 N=100
VoiceCLAP-Large × Prompt Match 1.2582 1.4203 1.4750 1.4850 1.5013 +0.2430 N=50
VoiceCLAP-Small × Prompt Match 0.7316 0.8069 0.8436 0.8484 0.8696 +0.1380 N=50
v1 Natural (Large) 0.9245 1.0314 1.0628 1.0900 1.0968 +0.1722 N=100
v2 Authentic (Large) 0.9405 1.0561 1.1017 1.1205 1.1271 +0.1866 N=50
v3 Professional (Large) 0.8097 0.8920 0.9233 0.9415 0.9450 +0.1353 N=50
v4 Expressive (Large) 0.9601 1.0789 1.1310 1.1307 1.1498 +0.1896 N=50
v5 Cinematic (Large) 0.9141 1.0181 1.0645 1.0716 1.0897 +0.1756 N=50
v6 Natural (Small) 0.6865 0.7797 0.7999 0.8265 0.8493 +0.1628 N=100
v7 Authentic (Small) 0.6046 0.6918 0.7222 0.7402 0.7547 +0.1501 N=100
v8 Professional (Small) 0.5712 0.6338 0.6486 0.6738 0.6775 +0.1063 N=100
v9 Natural−Robotic (Large) 1.6291 1.8186 1.8798 1.9103 1.9195 +0.2904 N=50
v10 Authentic−Cheap (Large) 1.6832 1.8913 1.9613 1.9899 1.9996 +0.3164 N=50
v11 Professional−Distorted (Large) 1.6093 1.7896 1.8517 1.8897 1.8934 +0.2841 N=50
v12 Expressive−Flat (Large) 1.6379 1.8253 1.9079 1.9154 1.9357 +0.2977 N=50
v13 FullPos−FullNeg (Large) 1.6227 1.8137 1.8707 1.9040 1.9110 +0.2883 N=50
v14 Warm−Robotic (Large) 1.5354 1.7098 1.7688 1.7972 1.8013 +0.2659 N=50
v15 Natural−Robotic (Small) 1.6384 1.8358 1.9058 1.9259 1.9414 +0.3030 N=50
v16 Authentic−Cheap (Small) 1.6024 1.8099 1.8977 1.9041 1.9280 +0.3255 N=50
v17 Professional−Distorted (Small) 1.4980 1.6583 1.7254 1.7480 1.7686 +0.2706 N=50
v18 Expressive−Flat (Small) 1.5510 1.7345 1.8091 1.8207 1.8629 +0.3119 N=50
v19 FullPos−FullNeg (Small) 1.4562 1.6334 1.6862 1.7129 1.7295 +0.2733 N=50
v20 Warm−Robotic (Small) 1.5747 1.7640 1.8339 1.8643 1.8857 +0.3109 N=50
v21 Sanitized Prompt (Large) 1.0507 1.1807 1.2371 1.2467 1.2569 +0.2062 N=50
v22 Sanitized Prompt (Small) 0.7210 0.8015 0.8351 0.8340 0.8599 +0.1389 N=50
v23 Sanitized−Uncanny (Large) 1.7781 1.9943 2.0676 2.0926 2.1024 +0.3243 N=50
v24 Sanitized−Uncanny (Small) 1.5440 1.7067 1.7908 1.7898 1.8203 +0.2763 N=50

Diminishing Returns — All Methods Overlaid

Original (5) 0.601 1.394 2.188 2.981 3.775 4.568 N=5 N=10 N=25 N=50 N=100 N candidates Standard: (1−WER) × Content Enjoyment VoiceCLAP-Large × Quality Text VoiceCLAP-Small × Quality Text VoiceCLAP-Large × Prompt Match VoiceCLAP-Small × Prompt Match Positive-only, Large (5) 0.729 0.824 0.920 1.016 1.112 1.207 N=5 N=10 N=25 N=50 N=100 N candidates v1 Natural (Large) v2 Authentic (Large) v3 Professional (Large) v4 Expressive (Large) v5 Cinematic (Large) Positive-only, Small (3) 0.514 0.590 0.665 0.741 0.816 0.892 N=5 N=10 N=25 N=50 N=100 N candidates v6 Natural (Small) v7 Authentic (Small) v8 Professional (Small) Pos−Neg, Large (6) 1.382 1.525 1.669 1.813 1.956 2.100 N=5 N=10 N=25 N=50 N=100 N candidates v9 Natural−Robotic (Large) v10 Authentic−Cheap (Large) v11 Professional−Distorted (Large) v12 Expressive−Flat (Large) v13 FullPos−FullNeg (Large) v14 Warm−Robotic (Large) Pos−Neg, Small (6) 1.311 1.456 1.602 1.747 1.893 2.039 N=5 N=10 N=25 N=50 N=100 N candidates v15 Natural−Robotic (Small) v16 Authentic−Cheap (Small) v17 Professional−Distorted (Small) v18 Expressive−Flat (Small) v19 FullPos−FullNeg (Small) v20 Warm−Robotic (Small) Sanitized Prompt (4) 0.649 0.961 1.272 1.584 1.896 2.208 N=5 N=10 N=25 N=50 N=100 N candidates v21 Sanitized Prompt (Large) v22 Sanitized Prompt (Small) v23 Sanitized−Uncanny (Large) v24 Sanitized−Uncanny (Small)

Marginal Improvement per Additional Candidate

MethodN=5→10N=10→25N=25→50N=50→100
Standard: (1−WER) × Content Enjoyment 0.08379/cand (11.5%) 0.01457/cand (5.4%) 0.00090/cand (0.5%) 0.00101/cand (1.2%)
VoiceCLAP-Large × Quality Text 0.01773/cand (10.3%) 0.00215/cand (3.4%) 0.00076/cand (1.9%) 0.00008/cand (0.4%)
VoiceCLAP-Small × Quality Text 0.01560/cand (11.7%) 0.00171/cand (3.4%) 0.00079/cand (2.6%) 0.00013/cand (0.8%)
VoiceCLAP-Large × Prompt Match 0.03241/cand (12.9%) 0.00365/cand (3.9%) 0.00040/cand (0.7%) 0.00033/cand (1.1%)
VoiceCLAP-Small × Prompt Match 0.01506/cand (10.3%) 0.00244/cand (4.5%) 0.00019/cand (0.6%) 0.00042/cand (2.5%)
v1 Natural (Large) 0.02138/cand (11.6%) 0.00209/cand (3.0%) 0.00109/cand (2.6%) 0.00014/cand (0.6%)
v2 Authentic (Large) 0.02312/cand (12.3%) 0.00304/cand (4.3%) 0.00075/cand (1.7%) 0.00013/cand (0.6%)
v3 Professional (Large) 0.01647/cand (10.2%) 0.00208/cand (3.5%) 0.00073/cand (2.0%) 0.00007/cand (0.4%)
v4 Expressive (Large) 0.02374/cand (12.4%) 0.00348/cand (4.8%) -0.00001/cand (-0.0%) 0.00038/cand (1.7%)
v5 Cinematic (Large) 0.02080/cand (11.4%) 0.00309/cand (4.6%) 0.00028/cand (0.7%) 0.00036/cand (1.7%)
v6 Natural (Small) 0.01864/cand (13.6%) 0.00135/cand (2.6%) 0.00106/cand (3.3%) 0.00046/cand (2.8%)
v7 Authentic (Small) 0.01745/cand (14.4%) 0.00202/cand (4.4%) 0.00072/cand (2.5%) 0.00029/cand (2.0%)
v8 Professional (Small) 0.01251/cand (11.0%) 0.00099/cand (2.3%) 0.00101/cand (3.9%) 0.00007/cand (0.5%)
v9 Natural−Robotic (Large) 0.03790/cand (11.6%) 0.00408/cand (3.4%) 0.00122/cand (1.6%) 0.00018/cand (0.5%)
v10 Authentic−Cheap (Large) 0.04161/cand (12.4%) 0.00467/cand (3.7%) 0.00114/cand (1.5%) 0.00019/cand (0.5%)
v11 Professional−Distorted (Large) 0.03606/cand (11.2%) 0.00414/cand (3.5%) 0.00152/cand (2.0%) 0.00007/cand (0.2%)
v12 Expressive−Flat (Large) 0.03747/cand (11.4%) 0.00551/cand (4.5%) 0.00030/cand (0.4%) 0.00041/cand (1.1%)
v13 FullPos−FullNeg (Large) 0.03819/cand (11.8%) 0.00380/cand (3.1%) 0.00133/cand (1.8%) 0.00014/cand (0.4%)
v14 Warm−Robotic (Large) 0.03490/cand (11.4%) 0.00393/cand (3.4%) 0.00114/cand (1.6%) 0.00008/cand (0.2%)
v15 Natural−Robotic (Small) 0.03949/cand (12.1%) 0.00467/cand (3.8%) 0.00080/cand (1.1%) 0.00031/cand (0.8%)
v16 Authentic−Cheap (Small) 0.04149/cand (12.9%) 0.00585/cand (4.9%) 0.00026/cand (0.3%) 0.00048/cand (1.3%)
v17 Professional−Distorted (Small) 0.03205/cand (10.7%) 0.00448/cand (4.1%) 0.00090/cand (1.3%) 0.00041/cand (1.2%)
v18 Expressive−Flat (Small) 0.03670/cand (11.8%) 0.00497/cand (4.3%) 0.00047/cand (0.6%) 0.00084/cand (2.3%)
v19 FullPos−FullNeg (Small) 0.03544/cand (12.2%) 0.00352/cand (3.2%) 0.00107/cand (1.6%) 0.00033/cand (1.0%)
v20 Warm−Robotic (Small) 0.03786/cand (12.0%) 0.00466/cand (4.0%) 0.00121/cand (1.7%) 0.00043/cand (1.1%)
v21 Sanitized Prompt (Large) 0.02600/cand (12.4%) 0.00376/cand (4.8%) 0.00039/cand (0.8%) 0.00020/cand (0.8%)
v22 Sanitized Prompt (Small) 0.01609/cand (11.2%) 0.00224/cand (4.2%) -0.00004/cand (-0.1%) 0.00052/cand (3.1%)
v23 Sanitized−Uncanny (Large) 0.04325/cand (12.2%) 0.00489/cand (3.7%) 0.00100/cand (1.2%) 0.00020/cand (0.5%)
v24 Sanitized−Uncanny (Small) 0.03254/cand (10.5%) 0.00560/cand (4.9%) -0.00004/cand (-0.1%) 0.00061/cand (1.7%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking MethodWithout Suffix (N=10)With Suffix (N=10)Delta
MeanBestMedianMeanBestMedianΔ MeanΔ Best
Standard: (1−WER) × Content Enjoyment3.11704.12543.20233.24093.95383.3013+0.1239-0.1717
VoiceCLAP-Large × Quality Text0.73210.96140.75370.75690.90620.7733+0.0247-0.0553
VoiceCLAP-Small × Quality Text0.55020.74650.56820.56350.71130.5728+0.0133-0.0352
VoiceCLAP-Large × Prompt Match1.07371.42611.10581.12181.36251.1424+0.0481-0.0636
VoiceCLAP-Small × Prompt Match0.61560.81210.62830.63170.78660.6306+0.0161-0.0255
v1 Natural (Large)0.78751.05290.81070.81750.98520.8363+0.0300-0.0677
v2 Authentic (Large)0.80571.07810.82930.84211.01610.8603+0.0364-0.0621
v3 Professional (Large)0.69110.90150.71150.71180.85420.7277+0.0207-0.0473
v4 Expressive (Large)0.82091.09300.84140.85711.03530.8721+0.0362-0.0577
v5 Cinematic (Large)0.77521.02700.79310.80780.98050.8245+0.0326-0.0465
v6 Natural (Small)0.57780.78410.59370.60010.74810.6115+0.0223-0.0360
v7 Authentic (Small)0.51340.71020.52440.54160.67900.5508+0.0282-0.0312
v8 Professional (Small)0.46950.63600.48310.47670.59880.4841+0.0072-0.0372
v9 Natural−Robotic (Large)1.39271.84081.42851.44871.74321.4791+0.0560-0.0975
v10 Authentic−Cheap (Large)1.44131.91801.48031.50891.83001.5369+0.0675-0.0879
v11 Professional−Distorted (Large)1.36941.81971.40821.42371.72061.4517+0.0543-0.0991
v12 Expressive−Flat (Large)1.39441.84121.42901.45381.76041.4788+0.0594-0.0808
v13 FullPos−FullNeg (Large)1.38821.83751.42881.44611.73891.4772+0.0579-0.0986
v14 Warm−Robotic (Large)1.31591.72721.35391.36941.64651.3960+0.0535-0.0806
v15 Natural−Robotic (Small)1.39531.84941.43161.45351.77611.4817+0.0583-0.0733
v16 Authentic−Cheap (Small)1.35941.83791.39171.43221.77271.4477+0.0728-0.0652
v17 Professional−Distorted (Small)1.27521.68111.31301.32611.59991.3582+0.0509-0.0811
v18 Expressive−Flat (Small)1.31271.78301.34201.36981.70621.3795+0.0572-0.0768
v19 FullPos−FullNeg (Small)1.24111.64521.27781.29031.57511.3187+0.0492-0.0700
v20 Warm−Robotic (Small)1.33911.77241.36821.38571.71041.4059+0.0466-0.0621
v21 Sanitized Prompt (Large)0.89111.20230.91790.92631.13740.9412+0.0352-0.0649
v22 Sanitized Prompt (Small)0.59780.80300.60870.61090.76250.6133+0.0131-0.0405
v23 Sanitized−Uncanny (Large)1.51262.01911.55571.57251.91581.6003+0.0599-0.1032
v24 Sanitized−Uncanny (Small)1.30581.74311.33611.34861.65511.3620+0.0428-0.0880

Per-Prompt Ablation: Standard Reward (N=10)

#LangNo Suffix MeanNo Suffix BestWith Suffix MeanWith Suffix BestΔ MeanΔ Best
0English3.64744.56423.46763.7627-0.1798-0.8015
1French3.47525.16134.31905.0907+0.8438-0.0706
2English1.53282.65551.26681.3606-0.2660-1.2949
3German4.92095.01054.96255.0478+0.0416+0.0373
4French4.20074.57983.99544.5486-0.2053-0.0312
5French1.57204.01011.53754.1837-0.0345+0.1736
6English3.31463.89153.09313.7265-0.2214-0.1650
7German2.16352.86662.48482.9931+0.3212+0.1265
8Spanish2.80794.55473.60504.5620+0.7971+0.0073
9French3.53533.96023.67734.2620+0.1420+0.3018

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 3.6402 4.0591 4.2778 4.3003 4.3509
Std Dev 1.3837 1.0022 0.8641 0.8529 0.8602
Avg Mean 3.0269 3.2295 3.1811 3.1568 3.1818

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 4.6083 4.4591 4.7830 4.8220 4.8220
1French 4.8813 4.9086 4.9463 4.9280 5.1613
2English 2.0031 2.0203 2.7075 2.6555 2.7075
3German 4.9534 4.9980 5.0693 5.0811 5.0811
4French 4.6051 4.6821 4.8282 4.6821 4.8282
5French 0.8635 4.0968 4.8829 4.8829 4.8829
6English 3.8327 3.8963 3.9148 3.9870 3.9870
7German 2.5382 2.5564 2.8891 2.9558 3.0241
8Spanish 3.8250 4.6863 4.6745 4.7173 4.7238
9French 4.2913 4.2876 4.0820 4.2913 4.2913

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8578 0.9465 0.9788 0.9977 1.0018
Std Dev 0.3412 0.2305 0.1847 0.1892 0.1918
Avg Mean 0.7144 0.7636 0.7474 0.7425 0.7478

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1195 1.1043 1.1422 1.1702 1.1702
1French 1.1256 1.1412 1.1412 1.1563 1.1563
2English 0.5421 0.5575 0.7208 0.7312 0.7312
3German 1.1796 1.1908 1.1733 1.1936 1.1936
4French 1.1615 1.1768 1.2063 1.2089 1.2377
5French 0.1678 0.8056 0.8953 0.8953 0.8953
6English 0.8742 0.9124 0.9038 0.9403 0.9403
7German 0.6078 0.6167 0.7145 0.7223 0.7253
8Spanish 0.7136 0.8812 0.8644 0.8721 0.8820
9French 1.0866 1.0783 1.0258 1.0866 1.0866

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6678 0.7458 0.7714 0.7913 0.7979
Std Dev 0.3225 0.2501 0.2388 0.2359 0.2423
Avg Mean 0.5468 0.5890 0.5705 0.5685 0.5713

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.7521 0.7519 0.7405 0.7767 0.7767
1French 0.9093 0.9121 0.9993 0.9993 1.0447
2English 0.3477 0.4046 0.4725 0.5119 0.5119
3German 1.0506 1.0508 1.0812 1.0851 1.0851
4French 0.9676 1.0207 1.0229 1.0462 1.0623
5French 0.0980 0.4484 0.5258 0.5258 0.5258
6English 0.6428 0.7003 0.7003 0.7401 0.7401
7German 0.4432 0.4763 0.5173 0.5226 0.5226
8Spanish 0.4619 0.6637 0.6329 0.6582 0.6637
9French 1.0045 1.0291 1.0213 1.0466 1.0466

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.2582 1.4203 1.4750 1.4850 1.5013
Std Dev 0.5172 0.4006 0.3412 0.3477 0.3418
Avg Mean 1.0502 1.1239 1.1059 1.0967 1.1057

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.6427 1.6722 1.7471 1.7397 1.7471
1French 1.7847 1.8246 1.8036 1.8339 1.8339
2English 0.6011 0.6206 0.8301 0.7949 0.8301
3German 1.8622 1.8656 1.8715 1.8756 1.8791
4French 1.5389 1.5324 1.5389 1.5389 1.5389
5French 0.3043 1.4622 1.6611 1.6611 1.6611
6English 1.2764 1.3363 1.3348 1.3881 1.3881
7German 0.8482 0.8471 1.0189 1.0189 1.0189
8Spanish 1.3010 1.5888 1.5870 1.5757 1.6226
9French 1.4229 1.4531 1.3569 1.4229 1.4931

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7316 0.8069 0.8436 0.8484 0.8696
Std Dev 0.3411 0.2717 0.2451 0.2633 0.2508
Avg Mean 0.6001 0.6379 0.6293 0.6247 0.6280

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1948 1.2023 1.2259 1.2256 1.2259
1French 0.7994 0.7983 0.8327 0.8100 0.8544
2English 0.3176 0.3177 0.3970 0.3843 0.3970
3German 1.1007 1.1006 1.1086 1.1588 1.1588
4French 0.7929 0.8113 0.8491 0.8113 0.8632
5French 0.1659 0.7888 0.8242 0.8637 0.8637
6English 1.0618 1.0851 1.0802 1.1144 1.1144
7German 0.5032 0.5129 0.6322 0.5742 0.6322
8Spanish 0.5634 0.6746 0.7005 0.7043 0.7494
9French 0.8167 0.7778 0.7856 0.8371 0.8371

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9245 1.0314 1.0628 1.0900 1.0968
Std Dev 0.3525 0.2195 0.1794 0.1782 0.1780
Avg Mean 0.7693 0.8212 0.8064 0.8001 0.8062

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2375 1.2006 1.2722 1.2924 1.2924
1French 1.2089 1.2266 1.2266 1.2551 1.2551
2English 0.5982 0.6299 0.7864 0.8460 0.8460
3German 1.1755 1.1767 1.1561 1.1997 1.1997
4French 1.1980 1.2235 1.2831 1.2667 1.2893
5French 0.1907 0.9233 1.0033 1.0033 1.0033
6English 0.9372 0.9771 0.9757 1.0136 1.0136
7German 0.7112 0.7354 0.8388 0.8415 0.8468
8Spanish 0.7690 0.9826 0.9350 0.9497 0.9826
9French 1.2191 1.2387 1.1507 1.2316 1.2387

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9405 1.0561 1.1017 1.1205 1.1271
Std Dev 0.3460 0.2288 0.1716 0.1691 0.1730
Avg Mean 0.7834 0.8410 0.8255 0.8193 0.8261

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1102 1.0841 1.1505 1.1660 1.1660
1French 1.2789 1.2996 1.2996 1.3347 1.3347
2English 0.5769 0.6119 0.7823 0.8180 0.8180
3German 1.2344 1.2412 1.2337 1.2561 1.2564
4French 1.2191 1.2360 1.2786 1.2623 1.2850
5French 0.2084 1.0084 1.1398 1.1398 1.1398
6English 0.9398 0.9662 1.0042 1.0325 1.0325
7German 0.7291 0.7394 0.8584 0.8641 0.8641
8Spanish 0.9287 1.1551 1.1216 1.1382 1.1551
9French 1.1798 1.2196 1.1483 1.1935 1.2196

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8097 0.8920 0.9233 0.9415 0.9450
Std Dev 0.3253 0.2195 0.1760 0.1733 0.1754
Avg Mean 0.6754 0.7206 0.7056 0.7009 0.7059

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0458 1.0497 1.0755 1.1026 1.1026
1French 1.0602 1.0708 1.0708 1.0760 1.0909
2English 0.4986 0.5323 0.6918 0.7076 0.7076
3German 1.1334 1.1367 1.1302 1.1379 1.1379
4French 1.1034 1.1154 1.1366 1.1291 1.1409
5French 0.1556 0.7525 0.8220 0.8220 0.8220
6English 0.8415 0.8782 0.8718 0.9155 0.9155
7German 0.5702 0.5700 0.6674 0.6906 0.6906
8Spanish 0.6748 0.8247 0.8116 0.8207 0.8288
9French 1.0134 0.9901 0.9548 1.0134 1.0134

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9601 1.0789 1.1310 1.1307 1.1498
Std Dev 0.3556 0.2437 0.1953 0.1896 0.1989
Avg Mean 0.7975 0.8569 0.8392 0.8330 0.8400

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2292 1.2237 1.3007 1.2799 1.3007
1French 1.3110 1.3405 1.3405 1.3405 1.3405
2English 0.5909 0.6097 0.8132 0.8219 0.8219
3German 1.1202 1.1382 1.1414 1.1530 1.1530
4French 1.2221 1.2307 1.2634 1.2634 1.3278
5French 0.2259 1.0865 1.2146 1.2146 1.2146
6English 1.1729 1.2011 1.2346 1.1959 1.2346
7German 0.6567 0.6622 0.7654 0.7672 0.7689
8Spanish 0.9263 1.1515 1.1317 1.1242 1.1518
9French 1.1462 1.1444 1.1044 1.1462 1.1840

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9141 1.0181 1.0645 1.0716 1.0897
Std Dev 0.3525 0.2444 0.1981 0.1935 0.2040
Avg Mean 0.7530 0.8110 0.7932 0.7873 0.7938

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0680 1.0384 1.1233 1.0920 1.1233
1French 1.3144 1.3057 1.2605 1.2766 1.3144
2English 0.5464 0.5531 0.7303 0.7446 0.7446
3German 1.1596 1.1531 1.1710 1.1665 1.1763
4French 1.2114 1.2192 1.2680 1.2438 1.2680
5French 0.2241 1.0629 1.1896 1.1896 1.1896
6English 1.0477 1.0780 1.0791 1.1309 1.1309
7German 0.5975 0.6191 0.7089 0.7196 0.7196
8Spanish 0.8344 1.0116 1.0107 1.0143 1.0300
9French 1.1378 1.1401 1.1033 1.1378 1.2003

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6865 0.7797 0.7999 0.8265 0.8493
Std Dev 0.3184 0.2540 0.2218 0.2110 0.2365
Avg Mean 0.5611 0.6082 0.5920 0.5884 0.5931

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.5042 0.5158 0.5308 0.5636 0.5636
1French 1.0793 1.0714 1.1328 1.1328 1.1968
2English 0.3353 0.4223 0.4961 0.5452 0.5452
3German 0.9399 0.9987 0.9987 0.9915 1.0523
4French 0.9445 1.0088 0.9445 1.0088 1.0257
5French 0.1246 0.6050 0.7106 0.7106 0.7106
6English 0.6721 0.7151 0.7101 0.7898 0.7898
7German 0.5601 0.5672 0.6213 0.6540 0.6540
8Spanish 0.6478 0.7696 0.8124 0.8106 0.8322
9French 1.0571 1.1232 1.0420 1.0576 1.1232

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6046 0.6918 0.7222 0.7402 0.7547
Std Dev 0.2713 0.2118 0.2069 0.1993 0.2144
Avg Mean 0.4936 0.5373 0.5238 0.5202 0.5253

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.4689 0.4771 0.5077 0.5184 0.5184
1French 0.9485 0.9631 1.0459 1.0459 1.1041
2English 0.3231 0.4319 0.5017 0.5160 0.5160
3German 0.8994 0.9610 0.9610 0.9877 0.9877
4French 0.7493 0.8390 0.7823 0.8481 0.8738
5French 0.1199 0.5811 0.6512 0.6512 0.6512
6English 0.5162 0.5560 0.5574 0.6228 0.6228
7German 0.4686 0.4701 0.4908 0.5243 0.5243
8Spanish 0.6786 0.7493 0.8443 0.8149 0.8590
9French 0.8732 0.8898 0.8794 0.8732 0.8898

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.5712 0.6338 0.6486 0.6738 0.6775
Std Dev 0.2654 0.2099 0.1875 0.1898 0.1922
Avg Mean 0.4710 0.5010 0.4847 0.4840 0.4869

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.8187 0.8392 0.7923 0.8449 0.8449
1French 0.6998 0.7245 0.7640 0.7640 0.7969
2English 0.3383 0.3970 0.4577 0.5254 0.5254
3German 0.8678 0.8678 0.8736 0.8964 0.8964
4French 0.7916 0.8587 0.8389 0.8784 0.8815
5French 0.0688 0.3298 0.3682 0.3682 0.3682
6English 0.5726 0.6132 0.6256 0.6504 0.6504
7German 0.3901 0.3964 0.4588 0.4584 0.4588
8Spanish 0.3822 0.5239 0.5052 0.5474 0.5474
9French 0.7822 0.7872 0.8016 0.8049 0.8049

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6291 1.8186 1.8798 1.9103 1.9195
Std Dev 0.6080 0.4153 0.3251 0.3118 0.3149
Avg Mean 1.3579 1.4514 1.4249 1.4155 1.4251

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0727 2.0516 2.1587 2.1570 2.1587
1French 2.1453 2.1363 2.1517 2.1581 2.1581
2English 0.9398 0.9151 1.1917 1.2604 1.2604
3German 2.1463 2.1568 2.1518 2.1776 2.1776
4French 2.0452 2.1136 2.1586 2.1147 2.1586
5French 0.3628 1.7799 1.9490 1.9490 1.9490
6English 1.6935 1.7594 1.7594 1.8424 1.8424
7German 1.2761 1.2796 1.4900 1.4900 1.4942
8Spanish 1.5271 1.8963 1.8559 1.8640 1.8981
9French 2.0823 2.0977 1.9308 2.0902 2.0977

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6832 1.8913 1.9613 1.9899 1.9996
Std Dev 0.6369 0.4517 0.3553 0.3491 0.3538
Avg Mean 1.3981 1.5046 1.4762 1.4664 1.4776

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8825 1.8513 1.9756 1.9565 1.9756
1French 2.3525 2.3542 2.3615 2.3857 2.3857
2English 0.9429 0.9609 1.2536 1.3073 1.3073
3German 2.2178 2.2171 2.2153 2.2496 2.2496
4French 2.2110 2.2467 2.2774 2.2444 2.2774
5French 0.3976 1.9374 2.1557 2.1557 2.1557
6English 1.7025 1.7518 1.7646 1.8343 1.8343
7German 1.2646 1.2909 1.4969 1.4950 1.4969
8Spanish 1.7063 2.1008 2.0683 2.1015 2.1124
9French 2.1543 2.2016 2.0443 2.1690 2.2016

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6093 1.7896 1.8517 1.8897 1.8934
Std Dev 0.6226 0.4050 0.3213 0.3086 0.3125
Avg Mean 1.3334 1.4292 1.4000 1.3909 1.4017

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0551 2.0254 2.1096 2.1381 2.1381
1French 2.1536 2.1758 2.1758 2.1941 2.1941
2English 0.9734 1.0393 1.3434 1.3864 1.3864
3German 2.0874 2.0829 2.0547 2.0935 2.0935
4French 2.1538 2.1594 2.2287 2.2064 2.2405
5French 0.3494 1.6953 1.8563 1.8563 1.8563
6English 1.7036 1.7410 1.7122 1.7939 1.7939
7German 1.1109 1.1450 1.3212 1.3771 1.3771
8Spanish 1.4541 1.7996 1.7722 1.7988 1.8021
9French 2.0519 2.0325 1.9430 2.0519 2.0519

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6379 1.8253 1.9079 1.9154 1.9357
Std Dev 0.6148 0.4403 0.3590 0.3562 0.3609
Avg Mean 1.3553 1.4549 1.4256 1.4162 1.4266

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1003 2.1003 2.2189 2.1747 2.2189
1French 2.2520 2.2560 2.2560 2.2560 2.2573
2English 0.9066 0.8806 1.1945 1.1769 1.1945
3German 2.0319 2.0390 2.0693 2.0693 2.0693
4French 2.0438 2.0554 2.1211 2.0741 2.1539
5French 0.3928 1.9088 2.1245 2.1245 2.1245
6English 1.9189 1.9542 1.9603 2.0076 2.0076
7German 1.1688 1.1635 1.3594 1.3594 1.3629
8Spanish 1.6018 1.9741 1.9580 1.9491 1.9909
9French 1.9623 1.9208 1.8171 1.9623 1.9769

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6227 1.8137 1.8707 1.9040 1.9110
Std Dev 0.6338 0.4351 0.3488 0.3464 0.3495
Avg Mean 1.3515 1.4499 1.4210 1.4121 1.4221

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0521 2.0285 2.1049 2.1275 2.1275
1French 2.2797 2.2929 2.2929 2.3011 2.3011
2English 0.9320 0.9520 1.2422 1.2692 1.2692
3German 2.1127 2.1286 2.0941 2.1396 2.1396
4French 2.1043 2.1308 2.2110 2.1680 2.2110
5French 0.3648 1.7720 1.9389 1.9389 1.9389
6English 1.6790 1.7274 1.7274 1.8241 1.8241
7German 1.1466 1.1816 1.3573 1.3670 1.3670
8Spanish 1.4695 1.8327 1.7913 1.8047 1.8327
9French 2.0864 2.0903 1.9472 2.0994 2.0994

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5354 1.7098 1.7688 1.7972 1.8013
Std Dev 0.5916 0.4193 0.3370 0.3345 0.3334
Avg Mean 1.2830 1.3725 1.3460 1.3376 1.3464

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9702 1.9651 2.0625 2.0661 2.0661
1French 2.0386 2.0441 2.0573 2.0598 2.0598
2English 0.8733 0.8413 1.1193 1.1351 1.1351
3German 2.1652 2.1834 2.1676 2.1864 2.1864
4French 1.9097 1.9656 2.0153 2.0034 2.0153
5French 0.3372 1.6433 1.8096 1.8096 1.8096
6English 1.6043 1.6463 1.6331 1.7261 1.7261
7German 1.1345 1.1494 1.3330 1.3330 1.3443
8Spanish 1.4368 1.7590 1.7458 1.7529 1.7695
9French 1.8838 1.9009 1.7442 1.8999 1.9009

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6384 1.8358 1.9058 1.9259 1.9414
Std Dev 0.6687 0.4751 0.4045 0.3996 0.4093
Avg Mean 1.3570 1.4600 1.4272 1.4173 1.4276

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.6948 1.7089 1.7841 1.8101 1.8101
1French 2.3612 2.3268 2.4076 2.3951 2.4527
2English 0.8562 0.9145 1.1865 1.1870 1.1870
3German 2.2969 2.3188 2.3448 2.3448 2.3549
4French 2.1632 2.2464 2.2989 2.2786 2.3003
5French 0.3583 1.7506 1.9350 1.9350 1.9350
6English 1.6269 1.7462 1.7045 1.7911 1.7911
7German 1.2255 1.2429 1.3957 1.4453 1.4453
8Spanish 1.5602 1.8737 1.8971 1.8309 1.8971
9French 2.2408 2.2295 2.1041 2.2408 2.2408

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6024 1.8099 1.8977 1.9041 1.9280
Std Dev 0.6573 0.5061 0.4549 0.4405 0.4600
Avg Mean 1.3129 1.4173 1.3900 1.3784 1.3898

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.4668 1.4293 1.5561 1.5558 1.5561
1French 2.4247 2.4184 2.4913 2.4283 2.5349
2English 0.8353 0.8968 1.1359 1.1275 1.1359
3German 2.3880 2.4561 2.4757 2.4860 2.4914
4French 1.9562 2.0757 2.1254 2.0757 2.1254
5French 0.3954 1.9223 2.0922 2.0922 2.0922
6English 1.5027 1.6135 1.6135 1.7000 1.7000
7German 1.2232 1.2393 1.3968 1.4144 1.4144
8Spanish 1.7489 2.0128 2.1290 2.0783 2.1461
9French 2.0831 2.0348 1.9611 2.0831 2.0831

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.4980 1.6583 1.7254 1.7480 1.7686
Std Dev 0.6035 0.4475 0.3630 0.3722 0.3740
Avg Mean 1.2447 1.3336 1.3066 1.2987 1.3075

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9541 1.9454 2.0261 2.0680 2.0680
1French 1.9884 2.0251 2.0264 2.0264 2.0895
2English 0.8184 0.7697 1.0209 1.0489 1.0489
3German 2.0171 2.0510 2.0893 2.0641 2.0893
4French 1.9795 2.0700 2.0562 2.0887 2.1237
5French 0.3070 1.4988 1.6613 1.6613 1.6613
6English 1.6094 1.6094 1.6682 1.7287 1.7287
7German 1.0425 1.0444 1.2497 1.1923 1.2497
8Spanish 1.3218 1.6518 1.5991 1.6594 1.6852
9French 1.9419 1.9171 1.8572 1.9419 1.9419

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5510 1.7345 1.8091 1.8207 1.8629
Std Dev 0.5825 0.4153 0.3175 0.3181 0.3351
Avg Mean 1.2740 1.3556 1.3330 1.3252 1.3340

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9827 2.0572 2.0572 2.0621 2.0621
1French 2.0235 2.0021 2.0021 2.0021 2.1373
2English 0.8092 0.8225 1.1438 1.1028 1.1438
3German 1.9407 1.9619 1.9888 1.9619 2.0188
4French 1.9734 1.9927 2.1350 1.9925 2.1509
5French 0.3588 1.6840 1.8924 1.9407 1.9407
6English 1.7205 1.7957 1.7525 1.8148 1.8207
7German 1.1635 1.1604 1.3597 1.3820 1.3820
8Spanish 1.5839 2.0063 1.9304 1.9938 2.0184
9French 1.9541 1.8626 1.8286 1.9541 1.9541

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.4562 1.6334 1.6862 1.7129 1.7295
Std Dev 0.5568 0.3881 0.3136 0.3103 0.3237
Avg Mean 1.2152 1.2975 1.2696 1.2609 1.2708

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.7601 1.7635 1.8438 1.9148 1.9148
1French 1.9580 2.0140 2.0440 2.0440 2.0877
2English 0.8341 0.8534 1.1086 1.1041 1.1086
3German 1.9933 2.0247 2.0218 2.0133 2.0537
4French 1.8444 1.9298 1.9589 1.9117 1.9713
5French 0.3026 1.5087 1.6523 1.6523 1.6523
6English 1.4453 1.5188 1.5103 1.5855 1.5855
7German 1.1276 1.1279 1.2679 1.3026 1.3026
8Spanish 1.4169 1.7389 1.7025 1.7210 1.7389
9French 1.8794 1.8542 1.7517 1.8794 1.8794

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5747 1.7640 1.8339 1.8643 1.8857
Std Dev 0.6634 0.4736 0.4052 0.4185 0.4148
Avg Mean 1.3063 1.3998 1.3689 1.3601 1.3679

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0071 1.9769 2.0496 2.0496 2.0648
1French 2.0524 1.9854 2.1039 2.1039 2.1665
2English 0.8100 0.8181 1.1035 1.0783 1.1035
3German 2.2472 2.2402 2.2904 2.3111 2.3111
4French 2.1160 2.2734 2.3291 2.3291 2.3291
5French 0.3379 1.6241 1.7747 1.7921 1.7921
6English 1.5238 1.6897 1.6237 1.7048 1.7530
7German 1.0790 1.1274 1.3161 1.3248 1.3248
8Spanish 1.3700 1.8080 1.7130 1.7451 1.8080
9French 2.2038 2.0969 2.0352 2.2038 2.2038

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0507 1.1807 1.2371 1.2467 1.2569
Std Dev 0.4170 0.2947 0.2464 0.2569 0.2516
Avg Mean 0.8718 0.9292 0.9176 0.9094 0.9166

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.4841 1.4532 1.5517 1.5681 1.5681
1French 1.4061 1.4510 1.4510 1.4510 1.4514
2English 0.5612 0.5944 0.7916 0.7633 0.7916
3German 1.4112 1.3926 1.3812 1.3973 1.4112
4French 1.2528 1.2476 1.2528 1.2528 1.2528
5French 0.2587 1.2370 1.4143 1.4143 1.4143
6English 1.1883 1.2423 1.2546 1.2964 1.2964
7German 0.6933 0.7077 0.8530 0.8530 0.8530
8Spanish 0.9591 1.2018 1.1831 1.1783 1.2158
9French 1.2925 1.2797 1.2373 1.2925 1.3148

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7210 0.8015 0.8351 0.8340 0.8599
Std Dev 0.3382 0.2701 0.2511 0.2633 0.2540
Avg Mean 0.5833 0.6207 0.6144 0.6093 0.6126

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.2427 1.2207 1.2508 1.2726 1.2726
1French 0.8132 0.8168 0.8185 0.8142 0.8443
2English 0.3167 0.3229 0.3946 0.3738 0.3946
3German 1.0059 1.0059 1.0218 1.0087 1.0545
4French 0.7387 0.7494 0.7543 0.7494 0.8121
5French 0.1730 0.8228 0.8774 0.8983 0.8983
6English 1.0897 1.1302 1.1430 1.1430 1.1510
7German 0.5155 0.5104 0.6380 0.5744 0.6380
8Spanish 0.5646 0.7145 0.7284 0.7256 0.7531
9French 0.7502 0.7212 0.7243 0.7803 0.7803

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7781 1.9943 2.0676 2.0926 2.1024
Std Dev 0.7034 0.5020 0.4095 0.4193 0.4152
Avg Mean 1.4779 1.5784 1.5535 1.5420 1.5532

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.4311 2.4023 2.5125 2.5265 2.5265
1French 2.4019 2.4348 2.4348 2.4348 2.4348
2English 0.9111 0.9375 1.2582 1.2379 1.2582
3German 2.3902 2.3837 2.3624 2.3877 2.3903
4French 2.1664 2.1809 2.1879 2.1664 2.1879
5French 0.4257 2.0618 2.3068 2.3068 2.3068
6English 1.9406 2.0147 2.0267 2.1296 2.1296
7German 1.2215 1.2507 1.4666 1.4666 1.4666
8Spanish 1.6835 2.0902 2.0623 2.0603 2.1141
9French 2.2091 2.1869 2.0582 2.2091 2.2091

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5440 1.7067 1.7908 1.7898 1.8203
Std Dev 0.6416 0.5003 0.4445 0.4564 0.4510
Avg Mean 1.2747 1.3525 1.3344 1.3243 1.3333

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.2525 2.2736 2.3386 2.3802 2.3802
1French 1.9994 2.0276 2.0112 1.9730 2.0276
2English 0.6968 0.6609 0.8407 0.8349 0.8407
3German 2.2097 2.2192 2.2531 2.2524 2.2531
4French 1.7903 1.8595 1.9609 1.8388 1.9651
5French 0.3583 1.7593 1.8678 1.9718 1.9718
6English 1.9158 1.9608 1.9931 2.0007 2.0080
7German 1.1425 1.1106 1.3516 1.2785 1.3516
8Spanish 1.3532 1.5629 1.6131 1.6459 1.6833
9French 1.7219 1.6331 1.6775 1.7219 1.7219

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100
DramaBox Prompt
A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100
DramaBox Prompt
High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.