Scenema Audio + Chatterbox VC — Best-of-N: Diminishing Returns

10 Path A prompts × 100 candidates — 29 ranking methods — Page 1/5

Methodology: Ranking Method Formulas & Text Prompts

All methods use reward = (1 − WER) × max(score, 0). The score varies per method as described below.

#KeyScore FormulaText Prompt(s)
0standardContent Enjoyment
1clap_lqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
2clap_sqcos(audio, quality_text)"pleasant, realistic, genuine, authentic, natural performance, high-quality recording"
3clap_lpcos(audio, prompt)Original DramaBox prompt
4clap_spcos(audio, prompt)Original DramaBox prompt
5v1_nat_Lcos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
6v2_auth_Lcos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
7v3_pro_Lcos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
8v4_expr_Lcos(audio, expr)"expressive, dynamic voice acting with rich emotional range"
9v5_cine_Lcos(audio, cine)"immersive cinematic narration, compelling storytelling"
10v6_nat_Scos(audio, nat)"natural, spontaneous, lifelike speech with genuine emotion"
11v7_auth_Scos(audio, auth)"authentic, emotionally truthful, deeply felt voice performance"
12v8_pro_Scos(audio, pro)"professional studio recording, crystal clear high-fidelity audio"
13v9_nr_Lcos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
14v10_ac_Lcos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
15v11_pd_Lcos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
16v12_ef_Lcos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
17v13_ff_Lcos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
18v14_wr_Lcos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
19v15_nr_Scos(audio, nat) − cos(audio, rob)+ "natural, spontaneous, lifelike speech with genuine emotion" / − "robotic, mechanical, monotonous, synthetic computer speech"
20v16_ac_Scos(audio, auth) − cos(audio, cheap)+ "authentic, emotionally truthful, deeply felt voice performance" / − "cheap, amateurish, rehearsed, stilted text-to-speech output"
21v17_pd_Scos(audio, pro) − cos(audio, dist)+ "professional studio recording, crystal clear high-fidelity audio" / − "distorted, noisy, muffled, low-quality poor recording"
22v18_ef_Scos(audio, expr) − cos(audio, flat)+ "expressive, dynamic voice acting with rich emotional range" / − "flat, lifeless, boring, emotionally dead recitation"
23v19_ff_Scos(audio, full_pos) − cos(audio, full_neg)+ "natural spontaneous genuine authentic high-quality voice performance" / − "robotic distorted monotonous rehearsed cheap artificial synthetic"
24v20_wr_Scos(audio, warm) − cos(audio, rob)+ "warm, pleasant, engaging conversational human voice" / − "robotic, mechanical, monotonous, synthetic computer speech"
25v21_san_Lcos(audio, sanitized_prompt)Quoted speech removed (Large)
26v22_san_Scos(audio, sanitized_prompt)Quoted speech removed (Small)
27v23_snr_Lcos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Large)
28v24_snr_Scos(audio, sanitized) − cos(audio, neg_san)Sanitized / − "robotic, distorted, uncanny" (Small)

Cross-Method Diminishing Returns Comparison

Method N=5 N=10 N=25 N=50 N=100 Gain N=5→100Knee Point
Standard: (1−WER) × Content Enjoyment 3.9499 4.0503 4.1326 4.2322 4.3374 +0.3875 N=100
VoiceCLAP-Large × Quality Text 0.9211 0.9425 0.9569 0.9797 1.0083 +0.0872 N=50
VoiceCLAP-Small × Quality Text 0.7002 0.7131 0.7621 0.7904 0.8128 +0.1125 N=100
VoiceCLAP-Large × Prompt Match 1.3721 1.4216 1.4328 1.4692 1.4961 +0.1240 N=25
VoiceCLAP-Small × Prompt Match 0.7432 0.8048 0.8119 0.8366 0.8575 +0.1143 N=25
v1 Natural (Large) 0.9969 1.0212 1.0380 1.0663 1.0956 +0.0987 N=100
v2 Authentic (Large) 1.0189 1.0470 1.0676 1.0947 1.1260 +0.1071 N=100
v3 Professional (Large) 0.8686 0.8893 0.9063 0.9264 0.9556 +0.0870 N=50
v4 Expressive (Large) 1.0448 1.0798 1.1098 1.1203 1.1697 +0.1249 N=50
v5 Cinematic (Large) 0.9862 1.0159 1.0463 1.0487 1.0960 +0.1098 N=50
v6 Natural (Small) 0.7051 0.7428 0.7822 0.8010 0.8394 +0.1343 N=100
v7 Authentic (Small) 0.6484 0.6718 0.7154 0.7355 0.7643 +0.1159 N=100
v8 Professional (Small) 0.5983 0.6205 0.6340 0.6668 0.6859 +0.0876 N=100
v9 Natural−Robotic (Large) 1.7625 1.8124 1.8343 1.8778 1.9138 +0.1513 N=25
v10 Authentic−Cheap (Large) 1.8266 1.8804 1.9070 1.9459 1.9885 +0.1619 N=25
v11 Professional−Distorted (Large) 1.7341 1.7810 1.8064 1.8522 1.9012 +0.1671 N=25
v12 Expressive−Flat (Large) 1.7763 1.8355 1.8676 1.8895 1.9485 +0.1722 N=50
v13 FullPos−FullNeg (Large) 1.7583 1.8020 1.8264 1.8647 1.9145 +0.1561 N=25
v14 Warm−Robotic (Large) 1.6636 1.7092 1.7270 1.7695 1.8088 +0.1452 N=25
v15 Natural−Robotic (Small) 1.7629 1.8004 1.8645 1.8902 1.9578 +0.1949 N=50
v16 Authentic−Cheap (Small) 1.7445 1.7845 1.8444 1.8749 1.9338 +0.1893 N=50
v17 Professional−Distorted (Small) 1.5994 1.6570 1.6846 1.7406 1.7715 +0.1721 N=100
v18 Expressive−Flat (Small) 1.6577 1.7797 1.7808 1.8097 1.9099 +0.2522 N=25
v19 FullPos−FullNeg (Small) 1.5700 1.6128 1.6612 1.6989 1.7469 +0.1769 N=50
v20 Warm−Robotic (Small) 1.7081 1.7511 1.8092 1.8405 1.9087 +0.2006 N=50
v21 Sanitized Prompt (Large) 1.1298 1.1680 1.1868 1.2151 1.2431 +0.1134 N=100
v22 Sanitized Prompt (Small) 0.7170 0.7874 0.7996 0.8311 0.8531 +0.1360 N=100
v23 Sanitized−Uncanny (Large) 1.9141 1.9748 1.9965 2.0393 2.0803 +0.1662 N=25
v24 Sanitized−Uncanny (Small) 1.6078 1.7076 1.7431 1.7802 1.8432 +0.2353 N=50

Diminishing Returns — All Methods Overlaid

Original (5) 0.630 1.415 2.200 2.985 3.769 4.554 N=5 N=10 N=25 N=50 N=100 N candidates Standard: (1−WER) × Content Enjoyment VoiceCLAP-Large × Quality Text VoiceCLAP-Small × Quality Text VoiceCLAP-Large × Prompt Match VoiceCLAP-Small × Prompt Match Positive-only, Large (5) 0.782 0.871 0.960 1.050 1.139 1.228 N=5 N=10 N=25 N=50 N=100 N candidates v1 Natural (Large) v2 Authentic (Large) v3 Professional (Large) v4 Expressive (Large) v5 Cinematic (Large) Positive-only, Small (3) 0.539 0.607 0.676 0.744 0.813 0.881 N=5 N=10 N=25 N=50 N=100 N candidates v6 Natural (Small) v7 Authentic (Small) v8 Professional (Small) Pos−Neg, Large (6) 1.497 1.615 1.734 1.852 1.970 2.088 N=5 N=10 N=25 N=50 N=100 N candidates v9 Natural−Robotic (Large) v10 Authentic−Cheap (Large) v11 Professional−Distorted (Large) v12 Expressive−Flat (Large) v13 FullPos−FullNeg (Large) v14 Warm−Robotic (Large) Pos−Neg, Small (6) 1.413 1.542 1.670 1.799 1.927 2.056 N=5 N=10 N=25 N=50 N=100 N candidates v15 Natural−Robotic (Small) v16 Authentic−Cheap (Small) v17 Professional−Distorted (Small) v18 Expressive−Flat (Small) v19 FullPos−FullNeg (Small) v20 Warm−Robotic (Small) Sanitized Prompt (4) 0.645 0.953 1.261 1.569 1.877 2.184 N=5 N=10 N=25 N=50 N=100 N candidates v21 Sanitized Prompt (Large) v22 Sanitized Prompt (Small) v23 Sanitized−Uncanny (Large) v24 Sanitized−Uncanny (Small)

Marginal Improvement per Additional Candidate

MethodN=5→10N=10→25N=25→50N=50→100
Standard: (1−WER) × Content Enjoyment 0.02008/cand (2.5%) 0.00549/cand (2.0%) 0.00398/cand (2.4%) 0.00210/cand (2.5%)
VoiceCLAP-Large × Quality Text 0.00428/cand (2.3%) 0.00096/cand (1.5%) 0.00091/cand (2.4%) 0.00057/cand (2.9%)
VoiceCLAP-Small × Quality Text 0.00258/cand (1.8%) 0.00327/cand (6.9%) 0.00113/cand (3.7%) 0.00045/cand (2.8%)
VoiceCLAP-Large × Prompt Match 0.00990/cand (3.6%) 0.00075/cand (0.8%) 0.00145/cand (2.5%) 0.00054/cand (1.8%)
VoiceCLAP-Small × Prompt Match 0.01232/cand (8.3%) 0.00047/cand (0.9%) 0.00099/cand (3.0%) 0.00042/cand (2.5%)
v1 Natural (Large) 0.00488/cand (2.4%) 0.00111/cand (1.6%) 0.00113/cand (2.7%) 0.00059/cand (2.7%)
v2 Authentic (Large) 0.00560/cand (2.7%) 0.00138/cand (2.0%) 0.00108/cand (2.5%) 0.00063/cand (2.9%)
v3 Professional (Large) 0.00415/cand (2.4%) 0.00113/cand (1.9%) 0.00080/cand (2.2%) 0.00058/cand (3.2%)
v4 Expressive (Large) 0.00700/cand (3.4%) 0.00200/cand (2.8%) 0.00042/cand (0.9%) 0.00099/cand (4.4%)
v5 Cinematic (Large) 0.00595/cand (3.0%) 0.00203/cand (3.0%) 0.00010/cand (0.2%) 0.00095/cand (4.5%)
v6 Natural (Small) 0.00753/cand (5.3%) 0.00263/cand (5.3%) 0.00075/cand (2.4%) 0.00077/cand (4.8%)
v7 Authentic (Small) 0.00468/cand (3.6%) 0.00290/cand (6.5%) 0.00080/cand (2.8%) 0.00058/cand (3.9%)
v8 Professional (Small) 0.00443/cand (3.7%) 0.00090/cand (2.2%) 0.00131/cand (5.2%) 0.00038/cand (2.9%)
v9 Natural−Robotic (Large) 0.00998/cand (2.8%) 0.00146/cand (1.2%) 0.00174/cand (2.4%) 0.00072/cand (1.9%)
v10 Authentic−Cheap (Large) 0.01076/cand (2.9%) 0.00177/cand (1.4%) 0.00156/cand (2.0%) 0.00085/cand (2.2%)
v11 Professional−Distorted (Large) 0.00938/cand (2.7%) 0.00170/cand (1.4%) 0.00183/cand (2.5%) 0.00098/cand (2.6%)
v12 Expressive−Flat (Large) 0.01183/cand (3.3%) 0.00214/cand (1.7%) 0.00088/cand (1.2%) 0.00118/cand (3.1%)
v13 FullPos−FullNeg (Large) 0.00874/cand (2.5%) 0.00163/cand (1.4%) 0.00153/cand (2.1%) 0.00099/cand (2.7%)
v14 Warm−Robotic (Large) 0.00913/cand (2.7%) 0.00119/cand (1.0%) 0.00170/cand (2.5%) 0.00079/cand (2.2%)
v15 Natural−Robotic (Small) 0.00752/cand (2.1%) 0.00427/cand (3.6%) 0.00103/cand (1.4%) 0.00135/cand (3.6%)
v16 Authentic−Cheap (Small) 0.00802/cand (2.3%) 0.00399/cand (3.4%) 0.00122/cand (1.7%) 0.00118/cand (3.1%)
v17 Professional−Distorted (Small) 0.01152/cand (3.6%) 0.00184/cand (1.7%) 0.00224/cand (3.3%) 0.00062/cand (1.8%)
v18 Expressive−Flat (Small) 0.02440/cand (7.4%) 0.00008/cand (0.1%) 0.00115/cand (1.6%) 0.00200/cand (5.5%)
v19 FullPos−FullNeg (Small) 0.00856/cand (2.7%) 0.00322/cand (3.0%) 0.00151/cand (2.3%) 0.00096/cand (2.8%)
v20 Warm−Robotic (Small) 0.00859/cand (2.5%) 0.00388/cand (3.3%) 0.00125/cand (1.7%) 0.00136/cand (3.7%)
v21 Sanitized Prompt (Large) 0.00765/cand (3.4%) 0.00125/cand (1.6%) 0.00113/cand (2.4%) 0.00056/cand (2.3%)
v22 Sanitized Prompt (Small) 0.01407/cand (9.8%) 0.00081/cand (1.5%) 0.00126/cand (3.9%) 0.00044/cand (2.6%)
v23 Sanitized−Uncanny (Large) 0.01214/cand (3.2%) 0.00145/cand (1.1%) 0.00171/cand (2.1%) 0.00082/cand (2.0%)
v24 Sanitized−Uncanny (Small) 0.01996/cand (6.2%) 0.00236/cand (2.1%) 0.00148/cand (2.1%) 0.00126/cand (3.5%)

Ablation: Pronunciation Suffix Effect

Comparing N=10 without suffix vs N=10 with suffix.

Ranking MethodWithout Suffix (N=10)With Suffix (N=10)Delta
MeanBestMedianMeanBestMedianΔ MeanΔ Best
Standard: (1−WER) × Content Enjoyment3.26893.99943.45833.17033.85963.2474-0.0985-0.1398
VoiceCLAP-Large × Quality Text0.76170.92830.80280.65450.78250.6742-0.1071-0.1459
VoiceCLAP-Small × Quality Text0.54340.67680.56190.65450.78250.6742+0.1111+0.1057
VoiceCLAP-Large × Prompt Match1.13121.39061.20160.65450.78250.6742-0.4767-0.6081
VoiceCLAP-Small × Prompt Match0.62770.78150.65850.65450.78250.6742+0.0269+0.0009
v1 Natural (Large)0.82041.00640.86780.65450.78250.6742-0.1659-0.2239
v2 Authentic (Large)0.84211.02980.89280.65450.78250.6742-0.1876-0.2473
v3 Professional (Large)0.71720.87720.75690.65450.78250.6742-0.0627-0.0948
v4 Expressive (Large)0.86461.05510.92250.65450.78250.6742-0.2101-0.2726
v5 Cinematic (Large)0.81140.99550.86270.65450.78250.6742-0.1569-0.2130
v6 Natural (Small)0.58320.72810.60830.65450.78250.6742+0.0714+0.0544
v7 Authentic (Small)0.52990.67510.55590.65450.78250.6742+0.1247+0.1074
v8 Professional (Small)0.46370.56980.48350.65450.78250.6742+0.1908+0.2126
v9 Natural−Robotic (Large)1.45851.77501.54351.30911.56491.3485-0.1494-0.2100
v10 Authentic−Cheap (Large)1.51041.84121.60251.30911.56491.3485-0.2013-0.2762
v11 Professional−Distorted (Large)1.42941.75891.51301.30911.56491.3485-0.1204-0.1940
v12 Expressive−Flat (Large)1.47131.79331.56381.30911.56491.3485-0.1622-0.2284
v13 FullPos−FullNeg (Large)1.45481.77431.53841.30911.56491.3485-0.1457-0.2094
v14 Warm−Robotic (Large)1.38161.67711.46051.30911.56491.3485-0.0725-0.1122
v15 Natural−Robotic (Small)1.44601.77971.52731.30911.56491.3485-0.1369-0.2147
v16 Authentic−Cheap (Small)1.42641.79751.50361.30911.56491.3485-0.1174-0.2326
v17 Professional−Distorted (Small)1.32291.61661.39601.30911.56491.3485-0.0138-0.0517
v18 Expressive−Flat (Small)1.38731.73351.46351.30911.56491.3485-0.0782-0.1686
v19 FullPos−FullNeg (Small)1.28371.58251.35481.30911.56491.3485+0.0254-0.0175
v20 Warm−Robotic (Small)1.38951.71971.46761.30911.56491.3485-0.0804-0.1548
v21 Sanitized Prompt (Large)0.92591.14990.98470.65450.78250.6742-0.2713-0.3674
v22 Sanitized Prompt (Small)0.60450.76500.62940.65450.78250.6742+0.0500+0.0175
v23 Sanitized−Uncanny (Large)1.57681.94031.67401.30911.56491.3485-0.2677-0.3754
v24 Sanitized−Uncanny (Small)1.34861.68561.42631.30911.56491.3485-0.0395-0.1207

Per-Prompt Ablation: Standard Reward (N=10)

#LangNo Suffix MeanNo Suffix BestWith Suffix MeanWith Suffix BestΔ MeanΔ Best
0English3.48604.46613.34863.5739-0.1374-0.8922
1French4.09884.93324.60675.0536+0.5079+0.1204
2English1.12791.34851.13761.3552+0.0097+0.0067
3German4.92855.03694.92124.9992-0.0073-0.0377
4French4.18964.50763.79504.4085-0.3946-0.0991
5French2.43794.24811.80084.2462-0.6371-0.0019
6English3.16213.90043.06213.7563-0.1000-0.1441
7German2.21892.97092.28292.5923+0.0640-0.3786
8Spanish3.78774.59713.27694.6148-0.5108+0.0177
9French3.25133.98503.47163.9956+0.2203+0.0106

Standard: (1−WER) × Content Enjoyment — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 3.9499 4.0503 4.1326 4.2322 4.3374
Std Dev 1.0361 1.1191 1.1542 0.9862 0.8503
Avg Mean 3.3231 3.3387 3.2637 3.2617 3.2778

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 4.3500 4.4322 4.7383 4.7751 4.8316
1French 4.8991 4.7082 4.9703 5.0533 5.0533
2English 1.9858 1.3609 1.3513 2.0546 2.6745
3German 5.0848 5.0334 5.0562 5.0881 5.0943
4French 4.4485 4.5793 4.7352 4.6121 4.7352
5French 3.5508 4.8004 4.7510 4.7510 4.8004
6English 3.6361 3.8367 3.8307 3.8307 3.9862
7German 2.4792 2.9760 2.9955 3.0226 3.0226
8Spanish 4.7640 4.7223 4.4861 4.7223 4.7640
9French 4.3008 4.0539 4.4119 4.4119 4.4119

VoiceCLAP-Large × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9211 0.9425 0.9569 0.9797 1.0083
Std Dev 0.2484 0.2534 0.2678 0.2269 0.1915
Avg Mean 0.7732 0.7865 0.7630 0.7635 0.7672

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0773 1.0829 1.1419 1.1794 1.1794
1French 1.1405 1.0858 1.1414 1.1502 1.1502
2English 0.5424 0.3723 0.3766 0.5424 0.7031
3German 1.1659 1.1632 1.1786 1.1740 1.1786
4French 1.1891 1.2254 1.2540 1.2254 1.2540
5French 0.7087 0.9286 0.8981 0.8981 0.9286
6English 0.8410 0.9019 0.9239 0.9076 0.9239
7German 0.5704 0.6978 0.7206 0.7206 0.7654
8Spanish 0.8780 0.8871 0.8216 0.8871 0.8871
9French 1.0975 1.0797 1.1122 1.1122 1.1122

VoiceCLAP-Small × Quality Text — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7002 0.7131 0.7621 0.7904 0.8128
Std Dev 0.2619 0.2686 0.2662 0.2568 0.2409
Avg Mean 0.5708 0.5822 0.5655 0.5663 0.5663

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.7558 0.7150 0.7519 0.8193 0.8193
1French 0.8168 0.8367 0.9473 0.9586 0.9910
2English 0.3473 0.2701 0.2792 0.3510 0.4441
3German 0.9741 1.0200 1.0530 1.0835 1.0835
4French 0.9963 1.1118 1.0419 1.1118 1.1118
5French 0.4261 0.4036 0.5839 0.5839 0.5839
6English 0.6510 0.6714 0.7872 0.7143 0.7872
7German 0.3838 0.5485 0.5099 0.5820 0.5820
8Spanish 0.5974 0.6164 0.6089 0.6460 0.6618
9French 1.0538 0.9380 1.0582 1.0538 1.0631

VoiceCLAP-Large × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.3721 1.4216 1.4328 1.4692 1.4961
Std Dev 0.4051 0.4346 0.4325 0.3975 0.3458
Avg Mean 1.1526 1.1629 1.1362 1.1359 1.1405

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.5854 1.6481 1.7236 1.7487 1.7487
1French 1.8114 1.7128 1.8114 1.8295 1.8295
2English 0.6009 0.4222 0.4322 0.6009 0.8299
3German 1.8505 1.8610 1.8732 1.8732 1.8732
4French 1.5178 1.5404 1.5178 1.5404 1.5404
5French 1.2678 1.6804 1.6614 1.6614 1.6804
6English 1.2336 1.3489 1.3319 1.3631 1.3631
7German 0.8176 0.9321 1.0047 1.0056 1.0056
8Spanish 1.6013 1.6346 1.5167 1.6346 1.6346
9French 1.4345 1.4352 1.4553 1.4345 1.4553

VoiceCLAP-Small × Prompt Match — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7432 0.8048 0.8119 0.8366 0.8575
Std Dev 0.2636 0.2904 0.2888 0.2739 0.2613
Avg Mean 0.6323 0.6431 0.6310 0.6272 0.6293

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0831 1.1955 1.2086 1.1961 1.2086
1French 0.7432 0.7705 0.7957 0.7961 0.8134
2English 0.3105 0.2363 0.2268 0.3107 0.3718
3German 1.0727 1.0810 1.1113 1.1630 1.1630
4French 0.7075 0.7617 0.7606 0.7637 0.7637
5French 0.5903 0.9346 0.7457 0.8766 0.9346
6English 1.0183 1.1058 1.1293 1.1214 1.1293
7German 0.4189 0.5340 0.6233 0.6045 0.6233
8Spanish 0.6782 0.6574 0.6805 0.6972 0.7299
9French 0.8091 0.7712 0.8369 0.8369 0.8369

v1 Natural (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9969 1.0212 1.0380 1.0663 1.0956
Std Dev 0.2571 0.2545 0.2695 0.2241 0.1802
Avg Mean 0.8320 0.8452 0.8195 0.8203 0.8246

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1857 1.1952 1.2568 1.2876 1.2876
1French 1.2315 1.1750 1.2315 1.2445 1.2445
2English 0.6025 0.4230 0.4229 0.6025 0.8102
3German 1.1574 1.1430 1.1649 1.1572 1.1726
4French 1.2454 1.2471 1.2950 1.2807 1.2950
5French 0.8118 1.0528 1.0213 1.0213 1.0528
6English 0.9111 0.9781 0.9676 0.9781 0.9784
7German 0.6136 0.7872 0.8525 0.8525 0.8758
8Spanish 0.9475 0.9763 0.9047 0.9763 0.9763
9French 1.2620 1.2348 1.2625 1.2620 1.2625

v2 Authentic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0189 1.0470 1.0676 1.0947 1.1260
Std Dev 0.2615 0.2638 0.2730 0.2190 0.1773
Avg Mean 0.8521 0.8651 0.8396 0.8412 0.8456

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0628 1.0670 1.1388 1.1636 1.1636
1French 1.3066 1.2607 1.3066 1.3200 1.3200
2English 0.5752 0.4125 0.4117 0.6001 0.8088
3German 1.2259 1.2139 1.2329 1.2223 1.2363
4French 1.2335 1.2342 1.2831 1.2528 1.2831
5French 0.8896 1.1530 1.1364 1.1364 1.1530
6English 0.8826 0.9831 0.9943 0.9831 0.9943
7German 0.6352 0.7913 0.8354 0.8683 0.8683
8Spanish 1.1407 1.1630 1.0670 1.1630 1.1630
9French 1.2372 1.1908 1.2696 1.2372 1.2696

v3 Professional (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.8686 0.8893 0.9063 0.9264 0.9556
Std Dev 0.2310 0.2348 0.2530 0.2122 0.1816
Avg Mean 0.7296 0.7414 0.7192 0.7193 0.7230

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0141 1.0347 1.0811 1.1173 1.1173
1French 1.0660 1.0103 1.0754 1.0801 1.0869
2English 0.4986 0.3644 0.3644 0.5153 0.6674
3German 1.1199 1.1236 1.1369 1.1286 1.1369
4French 1.1103 1.1361 1.1688 1.1361 1.1688
5French 0.6536 0.8675 0.8342 0.8342 0.8675
6English 0.8106 0.8538 0.9262 0.8932 0.9262
7German 0.5695 0.6619 0.6619 0.6801 0.7060
8Spanish 0.8302 0.8381 0.7731 0.8381 0.8381
9French 1.0133 1.0030 1.0409 1.0409 1.0409

v4 Expressive (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.0448 1.0798 1.1098 1.1203 1.1697
Std Dev 0.2537 0.2871 0.2920 0.2333 0.1949
Avg Mean 0.8741 0.8862 0.8619 0.8627 0.8664

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1818 1.2041 1.2898 1.2948 1.2948
1French 1.3248 1.2933 1.3248 1.3150 1.3482
2English 0.6020 0.4137 0.4137 0.6054 0.8385
3German 1.1259 1.1303 1.1303 1.1303 1.1303
4French 1.2500 1.2681 1.3472 1.2977 1.3472
5French 0.9723 1.2897 1.2420 1.2420 1.2897
6English 1.1058 1.1917 1.2326 1.1752 1.2326
7German 0.5900 0.7144 0.8013 0.8013 0.8126
8Spanish 1.1330 1.1583 1.0709 1.1583 1.1583
9French 1.1626 1.1348 1.2452 1.1826 1.2452

v5 Cinematic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.9862 1.0159 1.0463 1.0487 1.0960
Std Dev 0.2490 0.2770 0.2902 0.2319 0.2100
Avg Mean 0.8194 0.8336 0.8095 0.8105 0.8141

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.0256 1.0255 1.1112 1.1020 1.1112
1French 1.2755 1.1957 1.2755 1.2539 1.2755
2English 0.5470 0.3853 0.3853 0.5470 0.7394
3German 1.1468 1.1458 1.1464 1.1584 1.1584
4French 1.2074 1.2290 1.3202 1.2269 1.3202
5French 0.9483 1.2607 1.2090 1.2090 1.2607
6English 0.9812 1.0776 1.1659 1.1002 1.1659
7German 0.5633 0.6707 0.7077 0.7216 0.7216
8Spanish 1.0284 1.0300 0.9650 1.0300 1.0300
9French 1.1382 1.1388 1.1769 1.1382 1.1769

v6 Natural (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7051 0.7428 0.7822 0.8010 0.8394
Std Dev 0.2490 0.2466 0.2551 0.2276 0.2172
Avg Mean 0.5952 0.6060 0.5918 0.5900 0.5929

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.4792 0.5143 0.5181 0.5728 0.5728
1French 1.0104 1.0417 1.1146 1.1009 1.1587
2English 0.3315 0.2907 0.2863 0.3804 0.4947
3German 0.9029 0.9546 0.9873 0.9872 1.0199
4French 0.8190 0.8630 0.8701 0.9605 0.9605
5French 0.5005 0.5527 0.7225 0.7225 0.7225
6English 0.6633 0.7047 0.8337 0.7120 0.8337
7German 0.4840 0.6682 0.6294 0.6974 0.6974
8Spanish 0.8113 0.7829 0.7950 0.8275 0.8690
9French 1.0490 1.0547 1.0647 1.0490 1.0647

v7 Authentic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.6484 0.6718 0.7154 0.7355 0.7643
Std Dev 0.2243 0.2197 0.2322 0.2159 0.2032
Avg Mean 0.5389 0.5498 0.5341 0.5322 0.5360

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.4540 0.4522 0.4923 0.5184 0.5184
1French 0.8982 0.9189 1.0325 1.0340 1.0933
2English 0.3151 0.2946 0.2949 0.3762 0.4996
3German 0.8777 0.9326 0.9656 0.9917 0.9917
4French 0.7368 0.8217 0.8058 0.8405 0.8548
5French 0.5138 0.5512 0.6930 0.6930 0.6930
6English 0.5382 0.5364 0.6964 0.6203 0.6964
7German 0.4153 0.5608 0.4886 0.5608 0.5608
8Spanish 0.8523 0.8358 0.8042 0.8366 0.8523
9French 0.8831 0.8142 0.8803 0.8831 0.8831

v8 Professional (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.5983 0.6205 0.6340 0.6668 0.6859
Std Dev 0.2116 0.2117 0.2131 0.2144 0.1982
Avg Mean 0.4916 0.4993 0.4819 0.4839 0.4848

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 0.8533 0.8135 0.8216 0.9215 0.9215
1French 0.6250 0.6815 0.7290 0.7672 0.7672
2English 0.3390 0.2751 0.2717 0.3544 0.4492
3German 0.7857 0.8350 0.8485 0.8485 0.8485
4French 0.8347 0.9167 0.8709 0.9167 0.9167
5French 0.3013 0.3754 0.3971 0.3971 0.3971
6English 0.5794 0.5949 0.6289 0.6330 0.6330
7German 0.3944 0.4776 0.4529 0.4767 0.4776
8Spanish 0.4833 0.4990 0.5164 0.5436 0.6166
9French 0.7873 0.7363 0.8028 0.8093 0.8317

v9 Natural−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7625 1.8124 1.8343 1.8778 1.9138
Std Dev 0.4270 0.4701 0.4782 0.3921 0.3250
Avg Mean 1.4787 1.4975 1.4580 1.4581 1.4641

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9710 2.0491 2.1476 2.1351 2.1476
1French 2.1428 2.0394 2.1498 2.1792 2.1792
2English 0.9406 0.6121 0.6323 0.9406 1.2229
3German 2.1491 2.1612 2.1636 2.1636 2.1636
4French 2.0645 2.0823 2.1428 2.1103 2.1428
5French 1.5301 1.9780 1.9490 1.9490 1.9780
6English 1.6344 1.7506 1.7401 1.7506 1.7506
7German 1.1934 1.4529 1.5171 1.5171 1.5171
8Spanish 1.8767 1.9098 1.7736 1.9098 1.9098
9French 2.1225 2.0889 2.1268 2.1225 2.1268

v10 Authentic−Cheap (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.8266 1.8804 1.9070 1.9459 1.9885
Std Dev 0.4869 0.5073 0.5149 0.4312 0.3630
Avg Mean 1.5242 1.5477 1.5052 1.5075 1.5142

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.7902 1.8367 1.9457 1.9314 1.9457
1French 2.3703 2.2671 2.3703 2.3819 2.3819
2English 0.9469 0.6536 0.6631 0.9666 1.2856
3German 2.2040 2.2057 2.2057 2.2149 2.2149
4French 2.2093 2.2185 2.2718 2.2225 2.2718
5French 1.6745 2.1506 2.1444 2.1444 2.1506
6English 1.5897 1.7623 1.7631 1.7623 1.7631
7German 1.1506 1.4217 1.4693 1.5019 1.5019
8Spanish 2.1024 2.1049 1.9721 2.1049 2.1049
9French 2.2286 2.1832 2.2646 2.2286 2.2646

v11 Professional−Distorted (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7341 1.7810 1.8064 1.8522 1.9012
Std Dev 0.4503 0.4617 0.4840 0.3898 0.3347
Avg Mean 1.4446 1.4704 1.4253 1.4278 1.4349

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9732 2.0076 2.0894 2.1378 2.1378
1French 2.1766 2.0661 2.1766 2.1824 2.1880
2English 0.9795 0.7086 0.7086 1.0278 1.3325
3German 2.0586 2.0397 2.0602 2.0567 2.0854
4French 2.1923 2.2329 2.3106 2.2317 2.3106
5French 1.4664 1.9309 1.8909 1.8909 1.9309
6English 1.6002 1.7258 1.7735 1.7493 1.7735
7German 1.0369 1.2607 1.2942 1.3620 1.3704
8Spanish 1.7946 1.7953 1.6723 1.7953 1.7953
9French 2.0625 2.0422 2.0878 2.0878 2.0878

v12 Expressive−Flat (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7763 1.8355 1.8676 1.8895 1.9485
Std Dev 0.4260 0.5065 0.5070 0.4145 0.3568
Avg Mean 1.4899 1.5074 1.4696 1.4693 1.4744

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.0280 2.1126 2.2107 2.1920 2.2107
1French 2.2424 2.1651 2.2424 2.2355 2.2673
2English 0.9147 0.5840 0.6014 0.9147 1.2084
3German 2.0131 2.0551 2.0551 2.0551 2.0551
4French 2.0520 2.0891 2.1766 2.0949 2.1766
5French 1.6742 2.1944 2.1445 2.1445 2.1944
6English 1.8062 1.9320 1.9528 1.9274 1.9528
7German 1.1379 1.3116 1.4041 1.4041 1.4041
8Spanish 1.9660 1.9985 1.8712 1.9985 1.9985
9French 1.9285 1.9121 2.0168 1.9285 2.0168

v13 FullPos−FullNeg (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7583 1.8020 1.8264 1.8647 1.9145
Std Dev 0.4620 0.4861 0.5036 0.4216 0.3558
Avg Mean 1.4727 1.4948 1.4533 1.4549 1.4610

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9608 2.0103 2.0847 2.1336 2.1336
1French 2.2944 2.1780 2.2949 2.3020 2.3020
2English 0.9360 0.6393 0.6423 0.9472 1.2411
3German 2.0983 2.1017 2.1064 2.1075 2.1075
4French 2.1395 2.1546 2.2458 2.1727 2.2458
5French 1.5380 1.9911 1.9511 1.9511 1.9911
6English 1.5882 1.7128 1.7285 1.7128 1.7504
7German 1.0933 1.3085 1.3728 1.3728 1.4117
8Spanish 1.8120 1.8251 1.7040 1.8251 1.8280
9French 2.1227 2.0986 2.1334 2.1227 2.1334

v14 Warm−Robotic (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6636 1.7092 1.7270 1.7695 1.8088
Std Dev 0.4263 0.4658 0.4792 0.3978 0.3466
Avg Mean 1.4010 1.4205 1.3838 1.3839 1.3890

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8779 1.9566 2.0789 2.0521 2.0789
1French 2.0346 1.9460 2.0595 2.0815 2.0815
2English 0.8646 0.5522 0.5703 0.8646 1.0986
3German 2.1651 2.1746 2.1773 2.1773 2.2009
4French 1.9420 1.9618 2.0369 1.9870 2.0369
5French 1.4221 1.8388 1.8078 1.8078 1.8388
6English 1.5420 1.6624 1.6211 1.6624 1.6624
7German 1.0902 1.3163 1.3516 1.3516 1.3788
8Spanish 1.7649 1.7784 1.6591 1.7784 1.7784
9French 1.9325 1.9052 1.9078 1.9325 1.9325

v15 Natural−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7629 1.8004 1.8645 1.8902 1.9578
Std Dev 0.5017 0.4960 0.5355 0.4633 0.3993
Avg Mean 1.4708 1.4898 1.4513 1.4492 1.4561

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.6488 1.7065 1.7676 1.7916 1.8014
1French 2.3153 2.2224 2.4053 2.3848 2.4490
2English 0.8467 0.6330 0.6096 0.8785 1.1780
3German 2.3092 2.3154 2.3200 2.3361 2.3361
4French 2.0907 2.1790 2.2116 2.2459 2.2919
5French 1.5033 1.7800 1.9643 1.9643 1.9643
6English 1.6529 1.7269 1.9498 1.6928 1.9498
7German 1.1440 1.4311 1.3928 1.4866 1.4866
8Spanish 1.8794 1.8827 1.7967 1.8827 1.8827
9French 2.2383 2.1274 2.2274 2.2383 2.2383

v16 Authentic−Cheap (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7445 1.7845 1.8444 1.8749 1.9338
Std Dev 0.5379 0.5445 0.5657 0.5210 0.4555
Avg Mean 1.4425 1.4595 1.4211 1.4186 1.4266

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.3809 1.4605 1.4941 1.5194 1.5194
1French 2.4449 2.3596 2.4919 2.4988 2.5757
2English 0.8119 0.6120 0.6047 0.8305 1.1598
3German 2.4088 2.4501 2.4501 2.4869 2.4869
4French 1.8841 2.0720 2.0840 2.0720 2.0906
5French 1.6578 1.8901 2.0938 2.0938 2.0938
6English 1.5176 1.5212 1.7346 1.5726 1.7346
7German 1.1548 1.4413 1.3831 1.4581 1.4581
8Spanish 2.1296 2.1277 2.0185 2.1277 2.1296
9French 2.0541 1.9108 2.0891 2.0891 2.0891

v17 Professional−Distorted (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5994 1.6570 1.6846 1.7406 1.7715
Std Dev 0.4479 0.4860 0.5001 0.4269 0.3855
Avg Mean 1.3421 1.3697 1.3318 1.3323 1.3378

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8642 1.9530 2.0432 2.0555 2.0555
1French 1.9058 1.9187 1.9901 2.0964 2.0964
2English 0.8169 0.5120 0.5120 0.8169 1.0060
3German 2.0186 2.0532 2.0532 2.0532 2.0625
4French 2.0590 2.1292 2.1158 2.1036 2.1292
5French 1.2630 1.7543 1.6779 1.6779 1.7543
6English 1.5062 1.6104 1.6834 1.6756 1.6834
7German 0.9686 1.1675 1.2249 1.2390 1.2390
8Spanish 1.6446 1.6486 1.5433 1.6856 1.6856
9French 1.9470 1.8230 2.0027 2.0027 2.0027

v18 Expressive−Flat (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6577 1.7797 1.7808 1.8097 1.9099
Std Dev 0.4096 0.5021 0.4790 0.4030 0.3495
Avg Mean 1.3920 1.4088 1.3715 1.3702 1.3765

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.8925 2.0090 2.0555 2.0892 2.0892
1French 1.9245 1.9676 1.9677 2.1396 2.1396
2English 0.8228 0.5462 0.5565 0.8322 1.1512
3German 2.0265 1.9755 2.0265 1.9834 2.0265
4French 1.8850 2.0415 2.1878 2.0415 2.1878
5French 1.4913 2.2316 1.8938 1.9778 2.2316
6English 1.6425 1.8139 1.7999 1.7818 1.8370
7German 1.0728 1.2596 1.3945 1.3945 1.4380
8Spanish 1.9426 1.9672 1.9413 1.9803 2.0134
9French 1.8763 1.9847 1.9847 1.8763 1.9847

v19 FullPos−FullNeg (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.5700 1.6128 1.6612 1.6989 1.7469
Std Dev 0.3859 0.4233 0.4468 0.3841 0.3160
Avg Mean 1.3126 1.3291 1.2899 1.2885 1.2964

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.6920 1.7246 1.8583 1.8986 1.8986
1French 1.9208 1.8936 2.0274 2.0931 2.0931
2English 0.8238 0.5746 0.5592 0.8238 1.1143
3German 1.9430 2.0129 1.9983 1.9974 2.0437
4French 1.8329 1.9385 1.9891 1.9638 1.9891
5French 1.3100 1.6897 1.6879 1.6879 1.6897
6English 1.4709 1.5158 1.6477 1.5318 1.6477
7German 1.0837 1.2656 1.2837 1.3249 1.3249
8Spanish 1.7527 1.7707 1.6895 1.7974 1.7974
9French 1.8702 1.7422 1.8707 1.8707 1.8707

v20 Warm−Robotic (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.7081 1.7511 1.8092 1.8405 1.9087
Std Dev 0.5026 0.5152 0.5468 0.4675 0.4241
Avg Mean 1.4168 1.4319 1.3929 1.3934 1.3978

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.9556 2.0282 2.0467 2.0415 2.1879
1French 1.9558 1.9259 2.0562 2.0916 2.1705
2English 0.8164 0.5619 0.5469 0.8164 1.0715
3German 2.2923 2.2210 2.2923 2.2858 2.2971
4French 2.1272 2.3172 2.3547 2.3162 2.3547
5French 1.5113 1.7863 1.8355 1.8355 1.8355
6English 1.5639 1.6882 1.8540 1.7028 1.8540
7German 0.9827 1.2525 1.2931 1.3667 1.3667
8Spanish 1.6662 1.7121 1.6222 1.7393 1.7393
9French 2.2096 2.0173 2.1907 2.2096 2.2096

v21 Sanitized Prompt (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.1298 1.1680 1.1868 1.2151 1.2431
Std Dev 0.3053 0.3328 0.3357 0.2999 0.2526
Avg Mean 0.9421 0.9517 0.9285 0.9285 0.9324

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.4114 1.4243 1.5158 1.5312 1.5312
1French 1.4362 1.3614 1.4413 1.4571 1.4571
2English 0.5642 0.4081 0.4143 0.5642 0.7911
3German 1.3646 1.3644 1.3609 1.3722 1.4092
4French 1.2187 1.2284 1.2187 1.2284 1.2284
5French 1.0601 1.4074 1.4024 1.4024 1.4074
6English 1.1419 1.2705 1.2699 1.2712 1.2712
7German 0.6296 0.7288 0.8082 0.8175 0.8175
8Spanish 1.1832 1.2148 1.1333 1.2148 1.2148
9French 1.2876 1.2721 1.3036 1.2924 1.3036

v22 Sanitized Prompt (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 0.7170 0.7874 0.7996 0.8311 0.8531
Std Dev 0.2538 0.2969 0.2897 0.2738 0.2618
Avg Mean 0.6087 0.6219 0.6109 0.6070 0.6088

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 1.1052 1.2145 1.2221 1.2389 1.2409
1French 0.7607 0.7726 0.7963 0.8128 0.8128
2English 0.3121 0.2286 0.2201 0.3121 0.3787
3German 0.9288 0.9550 1.0054 1.0163 1.0163
4French 0.6158 0.6584 0.6674 0.6933 0.6933
5French 0.5652 1.0005 0.7901 0.9109 1.0005
6English 1.0324 1.1345 1.1808 1.1834 1.1834
7German 0.4282 0.5274 0.6373 0.6185 0.6373
8Spanish 0.6784 0.6631 0.7082 0.7567 0.7997
9French 0.7437 0.7195 0.7679 0.7679 0.7679

v23 Sanitized−Uncanny (Large) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.9141 1.9748 1.9965 2.0393 2.0803
Std Dev 0.5171 0.5552 0.5585 0.4952 0.4196
Avg Mean 1.6026 1.6198 1.5793 1.5795 1.5856

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.3112 2.3760 2.4739 2.4916 2.4916
1French 2.4160 2.2890 2.4269 2.4384 2.4384
2English 0.9139 0.6370 0.6574 0.9139 1.2489
3German 2.3492 2.3531 2.3492 2.3566 2.3954
4French 2.1244 2.1459 2.1244 2.1459 2.1459
5French 1.7536 2.2921 2.2855 2.2855 2.2921
6English 1.8642 2.0281 2.0294 2.0294 2.0294
7German 1.1252 1.3491 1.4267 1.4288 1.4288
8Spanish 2.0750 2.0953 1.9713 2.0953 2.1119
9French 2.2079 2.1821 2.2206 2.2079 2.2206

v24 Sanitized−Uncanny (Small) — Detailed Statistics

Expected Best Reward by N (averaged across all prompts)

N=5 N=10 N=25 N=50 N=100
Avg Best 1.6078 1.7076 1.7431 1.7802 1.8432
Std Dev 0.4618 0.5393 0.5341 0.4798 0.4386
Avg Mean 1.3624 1.3770 1.3471 1.3442 1.3516

Per-Prompt Best Reward by N

#Lang N=5 N=10 N=25 N=50 N=100
0English 2.1217 2.2303 2.3251 2.3172 2.3251
1French 1.9161 1.8960 1.9896 2.0428 2.0820
2English 0.6843 0.4682 0.4413 0.6843 0.8463
3German 2.1527 2.1854 2.1866 2.2201 2.2201
4French 1.6717 1.7874 1.8827 1.7990 1.8827
5French 1.3620 2.0900 1.7527 1.9163 2.0900
6English 1.7763 2.0273 2.0737 2.0622 2.0737
7German 1.0518 1.1949 1.3921 1.3324 1.4438
8Spanish 1.6534 1.6348 1.5947 1.6348 1.6753
9French 1.6882 1.5621 1.7926 1.7926 1.7926

Prompt #0 — English (Silicon Valley accent)

Language: English Accent: Silicon Valley accent Scored: 100/100
DramaBox Prompt
A young woman, possessing an extremely high fundamental frequency and bright, delicate harmonic texture, with a brisk, elevated momentum and a Silicon Valley accent; this is a pristine, high-quality studio voice recording with no background noise. She delivers the lines with a teasing lightness that occasionally borders on nervous energy, punctuated by small moments of genuine relief. (A brief, high-pitched Giggle escapes as she begins.) "Honestly, you think finding a solid Firestone review is that hard? Boggle, really. But look, that Lys thing actually worked." (She pauses, a subtle Contemplation washing over her features, then manages a slight, contained Chuckle.) "Just wait, I'll show you." She concludes with a soft, almost satisfied sigh, allowing the tension to dissipate.

Prompt #1 — French

Language: French Scored: 100/100
DramaBox Prompt
High-pitched, delicately resonant, and possessing the slightly strained clarity of a young adult female soprano; the voice is bright and purely head-dominant, engineered for intimate projection.

Pauses briefly, gathering strength. "Malgré la profondeur de cette sombre forêt, je sens toujours cette confiance absolue en mon chemin, guidée par la lumière."
A slight, almost imperceptible hardening of tone. "Même au cœur de cette nuit insondable, ma boussole intérieure me montre la seule direction véritable."
She finishes, a note of unwavering certainty settling.

The pace remains glacially slow throughout the utterance. The delivery conveys immense, quiet self-assurance.