Drivenets logo

Senior Solutions Engineer, AI/HPC Networking

Drivenets
Full-time
Remote
Worldwide
DevOps / Sysadmin

About the Company

DriveNets is a leader in disaggregated high-scale networking solutions for service providers and AI infrastructures. Founded in December 2015, DriveNets created a radical new way to build networks by adapting the architectural model of the cloud to telco-grade networking. This solution accelerates network deployment, improves the network’s economic model, and makes network operations much simpler. Customers include Comcast, Orange, and KDDI. Over 80% of AT&T’s network traffic now runs through a disaggregated core powered by DriveNets software. The DriveNets Network Cloud-AI solution, based on the same technology, was introduced to the market in 2023, providing the highest-performance Ethernet-based AI networking solution, and is already deployed by Hyperscalers, NeoClouds, and Enterprises. Having raised over $587 million in three funding rounds, DriveNets continues to deploy the most advanced network infrastructure and is looking for the most talented people to be part of this.

Responsibilities

  • Build strong AI/HPC infrastructure for new and existing customers.
  • Technical hands-on role in building and supporting NVIDIA/AMD based platforms.
  • Support operational and reliability aspects of large-scale AI clusters, focusing on performance at scale, training stability, real-time monitoring, logging, and alerting.
  • Administer Linux systems, ranging from powerful GPU enabled servers to general-purpose compute systems.
  • Design and plan rack layouts and network topologies to support customer requirements.
  • Design and evaluate automation scripts for network operations, configuring server and switch fabrics.
  • Perform Data Center upgrades and make sure deployment of Drivenets solutions goes smoothly.
  • Install and configure Drivenets products, making sure performance is optimal and customers are satisfied.
  • Maintain services once they are live by measuring and monitoring availability, latency, and overall system health.
  • Engage in and improve the whole lifecycle of services from start and design through deployment, operation, and refinement.
  • Provide feedback to internal teams such as opening bugs, documenting workarounds, and suggesting improvements.
  • Engage with sales teams and customers to make sure success with major opportunities and deployments.
  • Introduce new products to the Drivenets sales and support teams and to Drivenets customers.
  • Deliver technical trainings and TOIs for support/sales engineers, partners, and customers.
  • Collaborate on product definition through customer requirement gathering and roadmap planning.

Requirements

  • BS/MS/PhD in Electrical/Computer Engineering, Computer Science, Physics, or other Engineering fields, or equivalent experience.
  • 3+ years of network engineering (system/solution) experience.
  • 3+ years of solution architecture/sales engineering experience, or equivalent, working for a vendor, value-added reseller, or system integrator.
  • Technical expertise in Data Center or high-end enterprise network design (e.g. BGP, EVPN, VXLAN, QoS, Multicast).
  • Expertise with datacenter design, including networking, compute, and storage.
  • Ability to write extensive technical content (white papers, technical briefs, etc.) for external audiences with a balance of technical accuracy, strategy, and clear messaging.
  • Ability to multitask efficiently in a multifaceted environment, ability to work with teams across geographical locations.
  • Clear written and oral communication skills with the ability to effectively collaborate with executives and engineering teams.
  • Ability to travel domestic and international up to 20% of the time.
  • Be Kind.

Preferred Qualifications (Nice to Have)

  • Familiarity with AI-relevant data center infrastructure and networking technologies such as: Infiniband, RoCEv2, lossless Ethernet technologies (PFC, ECN, etc), accelerated computing, GPU, NIC, DPU, etc.
  • Understanding of AI/HPC networking infrastructure solutions, their advantages and disadvantages (AI/HPC networking design, high-speed interconnect technologies).
  • Scale-up – NVLink, UALink, etc.
  • Scale-out – Ethernet and Enhanced Ethernet (Scheduled Ethernet, dynamic load balancing and adaptive routing, Spectrum-X, UEC, etc), InfiniBand.
  • Backend storage connectivity.
  • Understanding of data center operations fundamentals in networking, cooling, and power.
  • Familiarity with monitoring tools (e.g., Prometheus, Grafana, ELK Stack) and Telemetry (gRPC, gNMI, OTLP, etc).
  • Proven experience with one or more Tier-1 Clouds (AWS, Azure, GCP, or OCI) or emerging Neoclouds, as well as cloud-native architectures and software.

Location

Bay Area - remote. WFH-Remote role with travel to customers.

Apply now
Share this job