Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

The big data Universe. Literally.

26 vues

Publié le

The advancement of technology in the last decade or so has allowed astronomy to see exponential growth in data volumes. ESA's space telescope Euclid will gather high-resolution images of a third of the sky, ~850GB of data downloaded daily for 6 years, by 2032 ground-based telescope LSST will have generated 500PB of data and the radio telescope SKA will be producing more data per second than the entire internet worldwide. This talk will address the questions of what current techniques exist to address big data volumes, how the astronomical community will prepare for this big data wave, and what other challenges lie ahead?

Publié dans : Logiciels
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

The big data Universe. Literally.

  1. 1. The Dark Matter Mystery The big data universe. Literally D r M a g g i e L i e u @space_mog The big data universe. Literally Bruno Merin & Beatriz Martinez
  2. 2. 1 black hole
  3. 3. M87’s super massive black hole 10,000,000 x smaller
  4. 4. 5x smaller M87 0.5 deg 0.1 deg
  5. 5. A telescope the size of the Earth! Event Horizon Telescope EHT - Radio telescopes from around the world
  6. 6. 📸 : Flora Graham 5PB of data
  7. 7. 1 blackhole = 5000TB 1TB/day 8TB/day
  8. 8. Global average broadband Download Upload 57.91Mbs 28.68Mbs
  9. 9. Global average broadband 57.91 Download Mbs 28.68 Upload Mbs Upload time = data volume rate = 5000000000[MB] x 8[Mb/MB] 28.68 Mbs = 44 years!!!!
  10. 10. European space agency
  11. 11. 📸 : MIT Haystack Observatory
  12. 12. 📸 : Katie Bouman 2 years of processing
  13. 13. But space data is different….
  14. 14. ‣ coming soon in 2022 Euclid
  15. 15. Gravitational lensing
  16. 16. background galaxies Weak gravitational lensing
  17. 17. foreground mass More mass = more distorted galaxies!
  18. 18. real galaxy shear atmosphere & telescope blur pixelised by detectors noise photo cred: CFHT
  19. 19. Euclid Wide Survey 0° 30° 60° -30° -60° 315°0°45° 270° 225°90°135° Moon for comparison ‣ 15,000 deg2
  20. 20. Flagship simulations ‣ End-2-end simulations ‣ Largest simulated galaxy catalogue ever built ‣ 2 trillion dark matter particles
  21. 21. ‣ Swiss National Computing Centre - 6th fastest computer in the world ‣ 80hrs ‣ 270,000 EUR Piz Daint - over 5000 GPU nodes
  22. 22. ‣ 5000 deg2 ‣ Raw simulation data: 0.4PB ‣ Compress to catalogs ‣ Rockstar (5.5TB) ‣ 2D dark matter count maps (1TB)
  23. 23. Sun L1 L2 Moon Earth 1,500,000km Collecting the data
  24. 24. DSA-2 DSA-3 ESA-ESOC ‣ ESA deep space tracking stations DSA-1
  25. 25. ‣ Ground station: Cebreros, Spain ‣ 4 hr communication window ‣ Steerable K-band (26 GHz) ‣ X-band (8.5 GHz) ‣ Data rate: 850 Gbit/day ‣ On board 4Tbit flash memory Telemetry
  26. 26. ‣ Science centre: ESAC, Spain ‣ Ex-telemetry, tracking & commanding station ‣ Quick look analysis, archiving and distribution
  27. 27. Storage
  28. 28. Attractive data No longer era where we fight for data, but era that we choose data!
  29. 29. Visualising the data ‣ ESA sky ‣ HiPS map, based on HEALpix ‣ Visualise TB’s of data ‣ Render kB of data Fernique et al 2015
  30. 30. www.sky.esa.int
  31. 31. Code-to-data platform ‣ Science Exploration platform: ‣ Jupyter notebooks on SPARK clusters ‣ Cloud computing: ‣ Amazon web services, ‣ Google cloud, etc
  32. 32. See Lieu+18 ‣ Raw data is nasty! ‣ with GB’s data per day, traditional methods are not efficient Analysing the data
  33. 33. Machine learning
  34. 34. Classification and detection ‣ K-means Characterising spectra of galaxies, Rahmani+18
  35. 35. Classification and detection ‣ DBSCAN core core border noise Stars in a star forming region, Canovas+19
  36. 36. Classification and detection ‣ K-nearest neighbour Defining types of supernovae, Lochner+16
  37. 37. Classification and detection ‣ Decision tree T > 2 S > 5 F>0. 3 D CB A T F T T FF Finding weird galaxies, Baron+16
  38. 38. Classification and detection ‣ Convolutional neural networks (& object detection) Strong gravitational lensing, Schaefer+17 Asteroids, Lieu+19
  39. 39. Classification and detection ‣ Transfer learning
  40. 40. Freeze Replace Classification and detection ‣ Transfer learning
  41. 41. Citizen science ‣ Outsource tasks to the general public ‣ Zooniverse platform: easy to build projects ‣ 100’s of Projects ‣ 250M classifications ‣ 2M Volunteers
  42. 42. Citizen science Classify galaxy morphologies: Lintott+08
  43. 43. Citizen science Find star forming bubbles: Kendrew+12
  44. 44. Citizen science Discover glitches in gravitational wave signals: Zevity+17
  45. 45. Data & model compression ‣ Neural networks & emulators Emulate the halo mass function with mixture density networks, Lieu+in. prep Emulate cosmology with neural density estimators, Alsing+2019 Scaling relations with principle component analyses PCA, Bothwell+2016
  46. 46. ‣ Edge computing: ‣ Some form in Gaia ‣ Mars rovers Upcoming methods
  47. 47. ‣ Continuous learning with GANS Upcoming methods
  48. 48. ‣ Federated learning Upcoming methods Super Model ESAC Data
  49. 49. 10PB of data is nothing…
  50. 50. ‣ First light: 2022 ‣ Duration: 10 years ‣ Data: 15TB/day
  51. 51. ‣ Search for transients (supernova, asteroids, comets, gamma ray bursts) ‣ Gravitational lensing img: Pursiainen
  52. 52. Can it get any worse…?
  53. 53. ‣ SKA - square kilometre array ‣ First light: 2030 ‣ Data: 2TB/sec
  54. 54. Karoo, South Africa SKA-Mid MRO, Australia SKA-Low
  55. 55. Centaurus A Optical Hubble
  56. 56. Centaurus A Radio VLA
  57. 57. Centaurus A Composite Hubble/VLA/Chandra
  58. 58. EHT 5PB 10PB 60PB 200,000,000PB EUCLID LSST SKA
  59. 59. Biggest challenges in Astronomy ‣ Collecting the data ‣ Retrieval ‣ Filtering good from bad ‣ Data Storage ‣ Distributing the data ‣ Upload/Download ‣ Combining data ‣ complementary observations and multi-wavelength observations ‣ Data analysis ‣ Compression ‣ Source detection ‣ Visualising the data Let’s collaborate!
  60. 60. Thank you for your attentioN

×