# Finding the Weird Blade of Grass in the Haystack

At the **Large Hadron Collider (LHC)**, physicists try to find new particles. But why? And, more interestingly, how can they be distinguished from the billions upon billions of known particles created at the LHC?

### Looking for New Physics …

The **Standard Model of Particle Physics (SM for short)** has so far proven to be a very effective theory. All particles predicted by it have been found (the last one being the Higgs Boson), and many measurements have confirmed the predictions of the SM. However, there are hints for Physics **Beyond the Standard Model (BSM)** which cannot be explained by the SM. The perhaps most prominent of these is **Dark Matter (DM)**. In short, observations of far away galaxies indicate that they have more mass than one would expect from the number of stars in these galaxies. But also on very tiny scales, we find things that can not be explained by the SM. Let’s take for example the neutrino, a particle that is produced e.g. in radioactive decays or nuclear fusion (e.g. in the sun) and only very rarely interacts with other particles. According to the SM, it should not have a mass. However, recent measurements have shown that it has a mass. It is very tiny (and far too small to explain the DM we observe), but it is different from the one predicted by the SM. This is just to name two examples, but there are also a few other observations that are in conflict with the SM.

Of course, one now wonders why this is the case. Many theories are developed that try to explain e.g. DM or neutrino masses. All these new theories have in common that they either predict new particles or a new interaction (which then would be mediated by a new particle). When searching for evidence for these new theories, e.g. at the LHC, one faces a huge challenge of scales: The whole LHC dataset contains billions upon billions of events, but if those BSM theories were true, they would maybe change this number by a few thousand. In other words, the task is to find a tiny deviation in a large dataset. While this may seem hopeless at first, it is possible by exploiting the features of the new BSM theory. Since they predict new particles or new interactions, they look different in the detector than SM interactions. Maybe the particles have a higher momentum or are flying in opposite directions, whereas in an SM process, there would be no correlation between the directions.

### … with a tangible analogy

This is quite complicated, but maybe easier to explain by an analogy: Let’s assume that you are looking for a new kind of grass on a meadow. The farmer already has cut the grass and put all the hay into his barn (“taking and storing the data”). You now think about the new grass you are looking for and find out that it should be a curved, yellow grass blade once it is a bit dry. This is different from the normal grass, which typically would be more or less straight and green-ish. So, in principle, the task is very easy now. You just sit down in the barn and pick up every single blade of grass. If it looks bent and yellow, you put it in a bag, if it doesn’t, you are not interested and feed the cows with it. Of course, this is quite a task, so you have to train other people to help you (or in the case of particle physics, tell a computer how to do that). So, you need some objective criteria to distinguish the two.

For the curvature, you simply make a jig. If the blade of grass is more curved than the jig, it probably is one of the new types. For the color, you can take one of these colour scales from a home improvement store and draw a line on it. If the blade of grass is more yellow than the line, it probably is one of the new, exciting kind. Blades of grass that pass both of these tests will be kept (they are in the “signal region”), the other ones are not interesting and will be discarded. Of course, determining where exactly the thresholds for colour and curvature should be is no trivial choice. You don’t know exactly how strong the curvature of the new grass is or how yellow it is exactly. The theory will just tell you a range of possible values, which is set both by existing measurements (“Nah, I’ve never seen grass that yellow,” says the farmer) and by the assumptions and approximations that were made when developing the theory (“It can’t be more curved than that, otherwise it’d be a tumbleweed.”). Such constraints are often necessary to reduce the complexity of the calculations, so that they can be solved at all. Similarly, some ordinary grass blades are more yellow or more curved than usual, while others are greener or straighter than one would expect. Typically, the ranges of possible colours and curvatures for new and ordinary grass overlap. So, one carefully has to decide which selection criteria to apply: If they are too loose and the huge number of ordinary grass blades passing the selection hides any of the new grass. But if they are too tight, you risk that nothing passes your selection and you also won’t discover anything.

Anyways, after careful consideration you came up with a minimum curvature and colour tone that is needed to pass your selection and you go to work. And indeed, after sifting through the barn, there are a few blades of grass in your bag. But is this really a reason to get excited and claim a discovery of yellow grass blades? Now it is time to dust off your statistics knowledge and ask a few questions:

- Assuming that there are no yellow grass blades, how many would I expect to be in my bag?

Since there are so many normal grass blades, a some of them will be more bent than usual or more yellow than usual. If both of this happens at the same time, they may pass your selection criteria, even though they are of the known ordinary type of grass. - How accurate is the above estimate?

The number of collected (“measured”) grass blades can — just by chance — differ from the expected number. The former follows a Poissson distribution around the latter, so this statistical uncertainty is fairly easy to calculate. However, since you go to the edge of knowledge, your calculations of how much ordinary grass could pass the selection may not be very reliable (“systematic uncertainty”). In practice, the accuracy of the estimate is checked with data in “control regions” and “validation regions”: If you select only straight grass blades, you don’t expect any of the new type there, so you can test whether you predict the colour of ordinary grass correctly. - Is the number of grass blades in the bag
*significantly*different from the expectation?

In other words, the chance of observing this amount of grass blades in the case that the new, curved and yellow ones don’t exist — which could happen by pure chance — should be below some threshold. “Two sigma”, or a chance of one in twenty is a common choice. However, many searches for BSM physics are performed each year and this criterion would lead to several false claims of discovery per year. Therefore, particle physics has agreed on “five sigma”, or a chance of being a random fluctuation of one in 3.5 million, as the significance to claim a discovery.

So you have finally finished analysing the data and evaluating the results. Most likely, you will not discover anything new, be it grass blades or particles. However, all this work was not in vain. You can still use the fact that you didn’t observe anything new to draw conclusions about the yellow, curved grass — the “exclusion limits.” Or, in simple words: The new grass has to be more curved than this, otherwise we would have seen it.

In a real search for physics Beyond the Standard Model, things are of course a lot more complicated. One needs more than just two variables, some of them are not directly measured, but calculated from measured variables. But the largest challenge is to get the background estimate correct. Since we are probing the edge of our knowledge, the simulations may not be accurate anymore and have to be validated in data. Choosing control regions in such a way that they are very similar to the signal regions (so that one can easily draw conclusions), but are influenced only very little by BSM physics is not trivial. For some backgrounds, we already know that simulations are not reliable. One can come up with ways to estimate these from data, but this is also difficult to do correctly.

But then, if it was easy, it would have been done already.