HPC Cluster Engineer
Posted 3 days 7 hours ago by Lawrence Harvey
Permanent
Not Specified
Other
Brussel, Belgium
Job Description
Your new company
The company is an IT consulting firm that provides a complete range of IT services geared towards supporting a large organisation in the automotive industry.
They pride themselves on being innovative and handling the needs of their customers in a professional way. Working in this company means being surrounded by qualified and technical people who keep adding value in an extremely competitive and ever-changing market.
Your new role
- You will be responsible for the administration of Linux based GPU HPC cluster for Artificial Intelligence (AI).
- You will support VRED Rendering Cluster and HPC cluster for Computer Aided Engineering (CAE).
- You will work on the maintenance of in-house Shell Scripts.
- You will work on failed computation investigation, problem determination, incident resolution, system support, co-ordination with vendor.
- You will support and educate users with no Linux experience.
- Installation and configuration of hardware, OS, and software + tuning for all R&D Linux workstations.
- Manage patching of Linux systems, including offline systems.
- Manage network aspects (DNS, DHCP, Internet access, ) with Network Team.
- Perform daily monitoring, management of the backup environment (Ceph) and ensure cluster high availability.
- Support artificial intelligence engineers to setup development environment on GPU HPC.
- Create long term environment management centralization.
- Support setup of a driving simulator based on Real Time OS.
- Support setup of integrated engineering development environment: Linux laptops, office workstations, in-car computer.
- Setup patching environment, including workstations without Internet connection.
- Ensure Linux environment match company security standards.
- Collaborate with other technical teams and integrate Linux workstations in AD domain.
- Deploying/Maintaining AWS AI Cluster.
- Supporting AWS VRED and CAE clusters.
What you need to succeed
- You will need to be good in stress management and problem-solving.
- You will need well-structured verbal and written communication (in English).
- You will need technical knowledge of Linux OS and Server knowledge, cluster management, and infrastructure administration.
- You will need storage solution understanding and operating.
What you'll get in return
- Base salary + bonus based on performance
- Remote working allowance
- Company car and fuel card
- Phone plan
- Meal vouchers
- Pension contribution
- Hospitalisation insurance
Lawrence Harvey is acting as an Employment Business in regards to this position.