About the Company
DriveNets is a leader in disaggregated high-scale networking solutions for service providers and AI infrastructures. Founded in December 2015, DriveNets created a radical new way to build networks by adapting the architectural model of the cloud to telco-grade networking. This solution accelerates network deployment, improves the network’s economic model, and makes network operations much simpler. Customers include Comcast, Orange, and KDDI. Over 80% of AT&T’s network traffic now runs through a disaggregated core powered by DriveNets software. The DriveNets Network Cloud-AI solution, based on the same technology, was introduced to the market in 2023, providing the highest-performance Ethernet-based AI networking solution, and is already deployed by Hyperscalers, NeoClouds, and Enterprises. Having raised over $587 million in three funding rounds, DriveNets continues to deploy the most advanced network infrastructure and is looking for the most talented people to be part of this.
Responsibilities
- Build strong AI/HPC infrastructure for new and existing customers.
- Technical hands-on role in building and supporting NVIDIA/AMD based platforms.
- Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.
- Administer Linux systems, ranging from powerful GPU enabled servers to general-purpose compute systems.
- Design and plan rack layouts and network topologies to support customer requirements.
- Design and evaluate automation scripts for network operations, configuring server and switch fabrics.
- Perform Data Center upgrades and make sure deployment of Drivenets solutions goes smoothly.
- Install and configure Drivenets products, making sure performance is optimal and customers are satisfied.
- Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
- Engage in and improve the whole lifecycle of services from start and design through deployment, operation, and refinement.
- Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
- Engage with sales teams and customers to make sure success with major opportunities and deployments.
- Introduce new products to the Drivenets sales and support teams and to Drivenets customers.
- Deliver technical trainings and TOIs for support/sales engineers, partners, and customers.
- Collaborate on product definition through customer requirement gathering and roadmap planning.
Requirements
- BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
- 3+ years of network engineering (system/solution) experience.
- 3+ years of solution architecture/sales engineering experience, or equivalent, working for a vendor, value-added reseller, or system integrator.
- Technical expertise in Data Center or high-end enterprise network design (e.g. BGP, EVPN, VXLAN, QoS, Multicast).
- Expertise with datacenter design, including networking, compute, and storage.
- Ability to write extensive technical content (white papers, technical briefs, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging.
- Ability to multitask efficiently in a multifaceted environment, ability to work with teams across geographical locations.
- Clear written and oral communication skills with the ability to effectively collaborate with executives and engineering teams.
- Ability to travel domestic and international up to 20% of the time.
- Be Kind.
Preferred Qualifications (Nice to Have)
- Familiarity with AI-relevant data center infrastructure and networking technologies such as: Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN, etc), accelerated computing, GPU, NIC, DPU, etc.
- Understanding of AI/HPC networking infrastructure solutions, their advantages and disadvantages (AI/HPC networking design, high-speed interconnect technologies).
- Scale-up – NVLink, UALink, etc.
- Scale-out – Ethernet and Enhanced Ethernet (Scheduled Ethernet, dynamic load balancing and adaptive routing, Spectrum-X, UEC, etc), InfiniBand.
- Backend storage connectivity.
- Understanding of data center operations fundamentals in networking, cooling, and power.
- Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and Telemetry (gRPC, gNMI, OTLP, etc).
- Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP, or OCI) or emerging Neoclouds, as well as cloud-native architectures and software.
Location
Bay Area - remote. WFH-Remote role with travel to customers.