Job Description
AI DevOps Engineer
Location: Must be Santa Clara, CA-based
Position Overview: We are seeking an innovative and highly motivated AI DevOps Engineer (Contractor) to join our dynamic team. In this role, you will bridge the gap between development and operations in AI-focused projects, ensuring the seamless deployment, scalability, and reliability of AI and machine learning (ML) applications. You will design and manage the infrastructure, tools, and pipelines that enable data scientists and AI engineers to efficiently develop, train, and deploy models.
In addition, you will leverage external and internal AI tools to automate from prompts to execution.
Skills -
Infrastructure Automation and Management:
-
Design, implement, and manage cloud-based or on-premise infrastructure for AI/ML workflows.
-
Automate infrastructure provisioning, configuration, and scaling using tools like Terraform, kubernete, or equivalent.
-
Optimize compute resources, ensuring cost-effective and efficient use for training and inference.
-
Familiarize with the fundamentals of artificial intelligence:
-
Develop and maintain Continuous Integration/Continuous Deployment (CI/CD) pipelines tailored for AI/ML workflows.
-
Automate the testing, validation, and deployment of AI models into production.
-
Integrate with monitoring tools to ensure pipeline reliability and performance.
-
Model Deployment and Monitoring:
-
Deploy machine learning models to production environments using containerization (e.g., Docker) and orchestration platforms (e.g., Kubernetes).
-
Implement model monitoring for drift detection, performance, and real-time insights.
-
Collaborate with data scientists to establish A/B testing and versioning strategies.
-
Collaboration with Cross-functional Teams:
-
Work closely with data scientists, ML engineers, and software developers to streamline workflows.
-
Build tools and frameworks that accelerate experimentation and productionalization of AI systems.
-
Security and Compliance:
-
Ensure data and model security by implementing robust access controls, encryption, and compliance with industry regulations.
-
Manage secure handling of sensitive data used in AI/ML training and testing.
-
Performance Optimization and Troubleshooting:
-
Identify bottlenecks in AI/ML workflows and optimize system performance.
-
Diagnose and resolve issues in the deployment pipeline or production environment.
Qualifications / Skills:
-
Education:
-
Bachelor's or Master's degree in Computer Science, Data Science, Engineering, or a related field.
-
New graduate is welcome
-
Experience:
-
Familiar with the fundamentals of artificial intelligence and machine learning.
-
Ideally have (but not required) Hands-on experience with open-source AI models
-
Proven fast learning capability.
-
Technical Skills:
-
Proficiency in scripting and automation languages or AI tools (Python, Bash, Jupiter Notebook, Golang, etc.).
-
Familiarity with one of following: AI models (hugging face, Claude, etc.), ML frameworks (TensorFlow, PyTorch, etc.), deployment tools (e.g., MLflow, Seldon, or TFX)
-
Solid understanding of computer algorithms, AI training, inference, and AI powered use cases
-
Good to have infrastructure as code (IaC) tools like Terraform or Ansible.
-
Soft Skills:
-
Strong problem-solving and analytical skills.
-
Excellent communication and teamwork abilities.
-
Adaptability to a fast-paced, evolving environment.
Employers have access to artificial intelligence language tools (“AI”) that help generate and enhance job descriptions and AI may have been used to create this description. The position description has been reviewed for accuracy and Dice believes it to correctly reflect the job opportunity.
Report this job
- Dice Id: 10371948
- Position Id: EIT - 3889-4162-1733872892
Job Tags
Contract work, For contractors,