The LHC performance has reached new heights, in-turn the amount of data produced by the underlying infrastructure and observation systems has increased by an order of magnitude putting unprecedented load on the current data Logging Service. In this talk Jakub will introduce you to the core business of beam production at CERN and the motivations behind the evolution of the current CERN Accelerator Logging Service towards Hadoop and Apache Spark.
11. Controls Data Logging
• Equipment readings (acquisitions or settings)
– Not physics data from experiments!
– Needed to operate the accelerators
• Online monitoring & ad-hoc queries
– Alarms, device/system failures
– Beam degradation or dumps
• Offline analytics & studies
– Beam quality improvements
– New beam types
– Future experiments or machines
11
12. CERN Accelerator Logging Service
• Old system (CALS) based on Oracle (2 DBs)
– ~20,000 devices (from ~120,000 devices)
– 1,500,000 signals
– 5,000,000 extractions per day
– 71,000,000,000 records per day
• 2 TB / day (unfiltered data, 2 DBs)
– 1 PB of total data storage (heavily filtered in 95%)
12
13. Current Issues With CALS
• Performance / scalability problems
– Difficult to scale horizontally
– “… to extract 24h of data takes 12h”
• CALS & tools not ready for Big Data!
– Have to extract subsets of data to do analysis!
13
14. Upcoming Challenges
• Future big scientific projects
– High Luminosity LHC (HL-LHC)
– Future Circular Collider (FCC)
• Important increase of data loads
– Data frequency increase (from 1Hz to 100Hz)
– Much bigger vectors
– Limited filtering
14
22. Future Directions
• Visualization for Analytics
• Service for Web based Analysis (SWAN)
– With NXCALS for Accelerator Controls data…
– …and Spark as first class citizen
• Unified Software Platform
• Interactive data analysis in the cloud
22
23. Conclusions
• Technology has to answer scientific needs
• Big Data is present in Accelerator Controls
• CERN started a sector-wide joint effort
– NXCALS (Controls Big Data Storage)
– SWAN (Analysis & Visualization)
• To promote and support science
• Spark is our choice for data extraction & analytics!
23
24. Links
• NXCALS - https://gitlab.cern.ch/acc-logging-team/nxcals
• Please come to the technical talk today afternoon!
• SWAN - https://swan.web.cern.ch/
• Science, accelerators, experiments
– https://greybook.cern.ch
– https://home.cern/about/accelerators
– PS virtual visit: https://goo.gl/1zqSTA
• Lectures from accelerator schools, summer courses
• http://cas.web.cern.ch/previous-schools
• https://indico.cern.ch/category/345/
• Beams Department, Controls Group - https://be-dep-co.web.cern.ch/
• Accelerators state - https://op-webtools.web.cern.ch/vistar/vistars.php?usr=SPS1
• Take Part: https://jobs.web.cern.ch/
24
Notes de l'éditeur
Speak about CERN, accelerators, the beam delivery process and how it relates to big data in Controls.
Focused on Research, Technology & Education in the domain of Fundamental Physics, Standard Model & basic building blocks of matter for which accelerators serve as giant microscopes.
Located at Franco/Swiss boarder near Geneva
Funded by 22 member states with 2.5 k employees and 10k users from all over the world forming unique international community and very interesting environment to work in.
It is home for LHC which is the biggest and the most complex accelerator in the world where in it giant detectors we could prove the existence of Higgs boson.
This is how it is laid out on map, big ring is the LHC, smaller is the SPS, others could barely be seen. It spreads from the Jura mointains and Geneva Lake on the other side. You can see also the 2 main CERN sites.
LHC is not the end of it, might be just the beginning as there are plans to build even greater FCC with a tunnel of 100km long that will go under the lake and spread from Jura mountains to the Alps. This is all still in plans, let’s hope this accelerator will be build before we all retire.
Coming back to what is CERN now, this is the pictogram of the whole complex with all accelerators and all experiments.
The biggest LHC related ones are here but it is not all about LHC, there is number of other accelerators & experimental sites.
The core business of CERN is to deliver a quality beam to those site in order for the scientists to perform their experimental physics there.
It all starts with the hydrogen bottle from which the protons are extracted and accelerated towards the booster.
The booster is the acc with 4 rings.
This is where the protons are grouped together into so called bunches.
In PS those bunches are shaped into their final form in terms of bunch spacing, bunch length or bunch number.
The beam gets extracted towards the SPS, accelerated further and finally injected into the LHC.
The LHC has 2 beam pipes and we accelerate the beam into the opposite directions. Later on we collide it at the experimental sites.
During this process the proton energy goes from 50MeV to 6.5TeV which is very small energy comparable with flying mosquito.
But when you sum up all the protons and all the bunches the total stored beam energy is comparable with Aircraft carrier cruising at 5 knots.
It takes around 20 minutes to fill up the LHC with all needed bunches so the process is repeated many times.
At the same time the beam also gets delivered to other experiments.
The interesting fact is the number of events that get generated from collisions where we enter the LHC physics state.
How do we deliver the beam to different experiments at the same time?
Beam time shared between experiments in so called cycles that are time intervals multiplication of 1.2 seconds.
Each cycle a new beam is created and delivered to a different experimental site.
Now what are the building blocks for a typical accelerator? What equipment is needed?
CERN designs & produces that equipment
Those are the device readings needed to operate the machines, this is not the experiments data, it is way before we get into the physics there.
This is needed to do the online monitoring & controls – respond to the events that happen every second in the accelerator chain.
And to do various scheduled studies (offline analytics)
We are throwing away 95% of the data.
How to extract specific subset of data – say an example.
When let’s take a look into the Luminosity Forecast (luminosity is a measurement – a figure of merit for accelerators).
We have this chart going up to 2038 but we are only here at this red bar, end of 2017.
Here we really have still at nothing but a relatively flat bottom and the increase is ahead of us.
But what is actually means for the Controls Data taking process?
Dramatic data growth now in Run2 compared to Run1.
We take 900 GB per day compared with less than 200 in Run1.
We are confronted with a situation which our systems where not really prepared for.
We had to react and apply some modern big data tools to go away from Oracle and build a Next Logging Service (NXCALS).
This is an example taken from the prove of concept result where Spark showed itself as one of the best tools on the market to do this kind of and extraction for Controls Data.
We’ve observed the emerging adoption of Python and Jupyter notebooks that is happening at CERN and other institute for data science and scientific computing.
That is setting the scene and showing directions for how we should present out controls data.
We also wanted to use more the existing on-premise cloud available with a large number of cores that was not so much used up to now for beam Controls systems.
This is the new system architecture where we use tools like Kafka and pump the data to Hadoop into Hbase and Parquet files.
We also use Spark and Jupyter to extract and present this data to our users.
Please do come to the technical presentation where I go into the details for this system.
A lot more coming, the visualization is shaping its way through a new project at CERN that is called SWAN based on Jupyter notebooks & Python.
We hope that together with NXCALS and Spark it will form a Unified Software Platform for interactive data analysis in the cloud.
It has a lot of potential and we have high hopes for it to materialize. I do look forward to coming back and talking about how this get together.
This would be a great thing to have this unification.