MLOps in Manufacturing: Data Lake for efficient data processing

Current Situation

Efficiently managing vast amounts of data for Machine Learning applications is crucial in today’s data landscape. A Data Lake is a modern platform that offers ACID transactions and versioned Parquet files for reliable data processing. The utilization of MinIO as an object storage solution enables scalable and high-performance data storage.

Possible Task:

The objective of this project is to evaluate and implement a Data Lake in conjunction with MinIO as a central data platform for Machine Learning applications. The work should encompass the following key points:

  • Familiarization with Data Lakes, MinIO, and relevant Machine Learning concepts:
    • Understanding Data Lakes as a comprehensive data repository and their integration into Machine Learning workflows.
    • Exploring the features, scalability, and configuration options of MinIO as an object storage.
  • Architecture and System Design:
    • Conceptualizing an architecture that utilizes Data Lakes and MinIO to provide and manage data for Machine Learning workflows.
    • Technology Scouting (MinIO, DremIO, Iceberg, Apache Spark, Delta Lake, etc.)
    • Selection of additional suitable tools (e.g., Apache Spark for data processing, Jupyter Notebooks for model development) and their integration into the platform.
  • Implementation of a Sample Application:
    • Developing a Proof-of-Concept application that utilizes Data Lakes for storing training data and model versions, using MinIO as the data storage.
    • Implementing data pipelines using Apache Spark for data processing and transformation.
  • Evaluation, Comparison, and Documentation:
    • Assessing the performance, scalability, and maintainability of the implemented solution compared to conventional approaches.
    • Documenting the implementation steps, challenges faced, and achieved results.

This project offers the opportunity to extensively engage with modern data management and Machine Learning technologies, while acquiring practical skills in implementing and integrating various tools.

 

Application Guidelines

Interested candidates should have:

  • A strong interest in the convergence of manufacturing and automation technologies
  • High motivation, quick learning ability, and a structured work approach.
  • Proficiency in scripting and automation languages (e.g., Python)

Applications, including a resume and any relevant qualifications, should be sent via email to tim.raffin@faps.fau.de.

Contact:

Tim Raffin, M.Sc. 

Kategorien:

Forschungsbereich:

Art der Arbeit:

Bachelorarbeit, Diplomarbeit, Hauptseminar, Masterarbeit, Projektarbeit, Studienarbeit

Studiengang:

Energietechnik, Informatik, IPEM, Maschinenbau, Mechatronik, Wirtschaftsingenieurwesen

Kontakt:

Tim Raffin, M.Sc.

Department Maschinenbau (MB)
Lehrstuhl für Fertigungsautomatisierung und Produktionssystematik (FAPS)