There is an HIGGS dataset [1]. As name suggest, it is designed to apply machine ...

dguest · 2026-03-28T15:05:21 1774710321

The LHC has moved on a bit since then. Here's an open dataset that one collaboration used to train a transformer:

https://opendata-qa.cern.ch/record/93940

if you can beat it with linear regression we'd be happy to know.

thesz · 2026-03-28T22:30:29 1774737029

Thanks.

The paper [1] referenced in your link follows the lagacy of the paper on the HIGGS dataset, and does not operate with quantities like accuracy and/or perplexity. HIGGS dataset paper provided area under ROC, from which one had to approximate accuracy. I used accuracy from the ADMM paper [2] to compare my results with. As I checked later, area under ROC in [1] mostly agrees with [2] SGD training results on HIGGS.

  [1] https://arxiv.org/pdf/2505.19689
  [2] https://proceedings.mlr.press/v48/taylor16.pdf

I think that perplexity measure is appropriate there in [1] because we need to discern between three outcomes. This calls for softmax and for perplexity as a standard measure.

So, my questions are: 1) what perplexity should I target when dealing with "mc-flavtag-ttbar-small" dataset? And 2) what is the split of train/validate/test ratio there?

dguest · 2026-03-29T17:14:09 1774804449

For better or worse the people working on this don't really use perplexity or accuracy to evaluate models. The target is whatever you'd get for those metrics if you used the discriminants that were provided in the dataset (i.e. the GN2v01 values).

As for why accuracy and perplexity aren't reported: the experiments generally choose a threshold to consider something a "b-hadron" (basically picking a point along the ROC curve) and quantify the TPR and FPR at that point. There are reasons for this, mostly that picking a standard point lets them verify that the simulation actually reflects data. See, for example, the FPR [1] and TPR [2] "calibrations".

It's a good point, though, the physicists should probably try harder to report standard metrics that the rest of the ML community uses.

[1]: https://arxiv.org/pdf/2301.06319

[2]: https://arxiv.org/abs/1907.05120

mpierini · 2026-04-01T12:52:17 1775047937

Perplexity, aka measuring how much a network is sure about its answer. Which might be wrong. It would not pass the pier review of any particle physics journal. (Real) science is about being right, not about being sure about itself.

mpierini · 2026-04-01T12:50:02 1775047802

And this problem is a joke compared to a real problem. We are talking about going from 40 MHz to 100 kHz incoming data stream, after which a second layer of real-time selection reduces the data to 1 kHz which is processed, cleaned, elaborated into high level features that you have in that dataset. But if you think you can do better, apply for a CERN job, come here and enlighten us!