Responsibilities
- Developing adversarial and risky prompt strategies across several areas of abuse to expose potential vulnerabilities in models.
- Managing projects end-to-end, from initial planning and oversight through quality assurance to final delivery.
- Handling extensive datasets across multiple languages and areas of abuse, ensuring precision and meticulous attention to detail.
- Ongoing investigation into new tactics for circumventing foundational models' safety measures.
- Working alongside diverse teams, engineering, product, policy, to tackle new challenges and craft forward-thinking strategies and resolutions.
- Promoting a culture of knowledge exchange and continual learning within the team.
Requirements
- Background in AI Safety and/or Responsible AI and/or AI Ethics.
- Familiarity with recent Generative AI models and agents is essential, though direct technical experience is not a prerequisite.
- Command of English at a near-native level.
- Attention to detail, organizational capabilities, and the capacity to juggle numerous tasks concurrently.
Preferred Qualifications (Nice to Have)
- Experience with various model types (Text-to-Text, Text-to-Image) is desirable.
- Prior experience with OSINT (Open Source Intelligence) will be considered an asset.
- A self-starter attitude, with the energy to excel in a fast-moving and variable environment.
About the Company
- ActiveFence is the leading provider of security and safety solutions for online experiences, safeguarding more than 3 billion users, top foundation models, and the world’s largest enterprises and tech platforms every day.
- As a trusted ally to major technology firms and Fortune 500 brands that build user-generated and GenAI products, ActiveFence empowers security, AI, and policy teams with low-latency Real-Time Guardrails and a continuous Red Teaming program that pressure-tests systems with adversarial prompts and emerging threat techniques.
- Powered by deep threat intelligence, unmatched harmful-content detection, and coverage of 117+ languages, ActiveFence enables organizations to deliver engaging and trustworthy experiences at global scale while operating safely and responsibly across all threat landscapes.
Category
AI & Machine Learning