This WG will study data management lifecycle on scalable architectures in a synergistic approach combining HPC and distributed computing, as future scalable systems will require sustainable data management for addressing the predicted exponential growth and complexity of digital information. The goal of this WG is to explore and re-think the relationship between the data management lifecycle and scalable architectures in order to pave the way towards reaching sustainable ultrascale on the next generation of computing platforms. Significant emphasis will be placed on the cross-fertilization between HPC and distributed computing in addressing challenges such as scaling I/O stack, expose and exploit data locality, energy efficient data management, improve the scalability of big data applications and data analytics. Additionally, based on scalable data management the Action is looking to build a multi-disciplinary environment by attracting applications scientists from different domains who face big data problems.
Key objectives: contributing to the evolution of the storage I/O stack towards higher-levels of scalability and sustainability; data sharing/integration (globalization of data); improving the programmability of data management and analysis; improving the exploitation of data workload predictability and manage uncertainty through adaptivity.
Topics: synergies between HPC and distributed computing (e.g. HPC in the data clouds, Map- Reduce in HPC); scalability of storage I/O stack, reducing bottlenecks and exploiting data organization to pave the way towards ultrascale file systems; analyse the impact/integration of novel memory/storage technologies in the large-scale systems (Storage Class Memory, Shingled Disks, 3D stacked memory); investigate the role of data locality throughout the memory and storage hierarchy; predictive and adaptive data management for performance, elasticity, resilience; energy- aware data management in big data analysis techniques and applications.
|Position paper: Data Storage for Big Data in the Exascale Era: Challenges and Prospects<||Download|
|A data-aware scheduling strategy for workflow execution in clouds< .||Download|