Diepvu Le
About
Diepvu works in the following industries: "Computer Hardware", "Computer Software", "Semiconductors", "Internet", and "Computer Games". Diepvu is currently Senior manager, Software Engineering at Red Hat. In Diepvu's previous role as a Senior Engineering Manager at Branch, Diepvu worked in until Aug 2021. Prior to joining Branch, Diepvu was a Engineering Manager II at Uber and held the position of Engineering Manager II at San Francisco / Palo Alto. Prior to that, Diepvu was a Sr. Software Engineer II at Uber, based in San Francisco / Palo Alto Bay Area from Jan 2016 to Oct 2018. Diepvu started working as Staff Software Engineer at Twitter in San Francisco Bay Area in Oct 2013. From Oct 2013 to Jan 2016, Diepvu was Software Engineering Manager at Twitter, based in San Francisco Bay Area. Prior to that, Diepvu was a Sr. Software engineer at Twitter, based in San Francisco Bay Area from Sep 2012 to Oct 2013. Diepvu started working as System Engineering manager at Zynga in San Francisco Bay Area in May 2011.
Diepvu Le can be found on Finalscout.com, where members can access Diepvu Le's email for free. Finalscout is a professional database with more than 500 million business professional profiles and 200 million company profiles.
Diepvu Le's current jobs
Red Hat Advanced Cluster Security for Kubernetes (RHACS) We are hiring! Reach out to me if interested. https://careers-redhat.icims.com/jobs/search?ss=1&searchKeyword=Advanced+Cluster+Security
Diepvu Le's past jobs
Leading the data engineering and platform teams at Branch providing end-to-end big data solutions and realtime data analytics used by over 50,000+ mobile apps Branch powers mobile growth for 50,000+ of the most advanced apps in the world, including household names like Airbnb, Buzzfeed, Twitch, Under Armour, and many more. Every day, our customers generate more than 100 million new URLs, and we process 2.5TB of raw data from over 10 billion events.
Managing the Data Foundation - Streaming & Realtime Analytics Platform
Data Analytic / Infrastructure / Storage engineer: • Developing and supporting a GPU-based analytics engine -- https://github.com/uber/aresdb • Streaming data in the cloud -- Kafka ecosystem on AWS. • Data Foundation - Streaming and Realtime Analytics Platform • Data Foundation - Hadoop Ecosystem Engineering • Storage as a Service -- Cassandra on Mesos, MySQL, Postgres. • Build the distributed Marketplace Storage System for Uber's dispatch and fulfillment (internally called MSG)
Build the Hardware-Quality team to support a large infrastructure focusing on Hardware & System Reliability
Managed the Hardware & System Reliability Engineer team; a group of talented engineers who are responsible for all production reliability aspects of hardware, BIOS, firmware, and Linux Kernel/OS to a more than 200,000 of Twitter’s servers. Our main customer is the Site Reliability Engineer (SRE) team. We support them across many different technologies including Mesos & Aurora, Hadoop, Vertica, and DB ecosystems to handle all system or performance related issues that impacted Twitter infrastructure and applications. We provided Framework and manage end-to-end system Bios, firmwares, kernel upgrades to very large number of servers: • Handling and overseen end-to-end hardware quality issues and managing low Total Cost of Ownership (TCO) for each hardware platform. Involved early in the manufactured product process -- As early as Design Verification (DVT) transaction into Product Verification (PVT). Successfully supporting and empowering the team to establish sets of production operational requirements, and raising the bar on hardware quality. • Built automation frameworks to monitor, and data mining server health and quality data throughout the fleets. Our team provides weekly metrics for hardware quality data such as HW Annual Failure Rate (AFR), hardware component (SSD, hard disks, memory, etc) issues and the failure statistics per each hardware platform / vendor / etc. Real-time TCO metrics and charge-back were also an important goal in our team roadmap • Build the hardware quality & sustaining processes • Implemented the better quality control and incident avoidance.
Infrastructure engineer. Built automation frameworks to monitor, and data mining server health and quality data throughout the fleets. Hosting everyone to tweet...
Managing and supporting system engineer team (including across geography members) to build tools and support the high availability & serviceability of Zclound (Zynga private cloud) Architected and led a small system engineer team to build Zynga bare-metal and virtualized server farms – successfully build two new Zynga’s datacenter in California with about 30,000 servers online and in service for Zynga’s games. Bellow are couple highlights: • System performances, power-consumption benchmarks, and capacity planning to build our new Zynga gaming infrastructures - Zcloud • Worked with OEMs, and ODMs on new system architecture such as new intel CPU architectures • Build tools to automatic provision, maintain, and support the high availability of Zynga’s infrastructures and services. Successfully provisioning and supporting more than 40,000 production servers in all datacenter locations with different hardware from multiple vendors. • Build a system burn_in framework to automate component performance qualification • Build out-of-band with IPMI commands working with Ganglia, and Nagios to monitoring and support system administrator operational. Standardized what I called MORE (‘Minimum Operational Requirements or Equivalents’) for BMC/IPMI from all vendors. • Debugged, analyzed and root-caused Linux kernel crashes, and patched in kernel / driver bug fixes • Created "Splunk" dashboards to monitoring system / kernel issues, and proactively plan out solutions to such distributive known kernel issues.
Developed new features requests, handled customer-escalations (active member of tiger team) for storage and network related issues in ESX server products (2.5.x, to 5.0.x) from a large customer / partner base, included IBM, HP, Fujitsu, and DELL. Specialized in iSCSI, NFS, SAS, SATA, tape, USB, and multi-path storage drivers in ESX kernel virtualization environment. A few high-lines are listed below: • Designed and enhanced the SCSI multi-path I/O modules (internal named PSA -- Plug-in Storage Architecture) of the virtual machine kernel (vmkernel) to protect against path failures. Major performance improved ESX boot time by introduced a target/LUN parallel discovery mechanism. This has successfully resolved a lot of major customer issues on older produces. • Implemented SCSI LUN reset feature in software initiator iSCSI driver for vmkernel multi-path codes to cleanly doing path-failover without corrupting concurrently IO on other iSCSI LUNs behind the same SCSI-target. This feature was successfully resolved a lot of customer’s path- failover issues on software initiator iSCSI storage. • Designed and implemented storage device/patch claiming rules to enhance the flexibility for ESX server to manage its devices/patches. Successfully handling dynamically the conflicting requirement from EMC and Dell-Equallogic for their management LUNs just by a configuration option change for their devices/patches in ESX server. Device/patch claiming rule mechanism has become a major feature in ESX4.0.x to enable major vendors’ storage plug-in modules in ESX environment – EMC’s storage multipath module (EMC’s Powerpath) is loading at the same time with VMWare in-house storage multipath module; device/path claim rules to determine which storage module to claim the paths to a particular storage LUN.
LINUX OS / MAC OS X kernel development, multi-threaded programming, SCSI/SATA, PCI/PCI-X/PCI-EXP, RAID (0,1,10,5, N-Way Mirror) device driver development for SATA storage products including SteelVine product line. · Architected and deployed a monotonic code-base (SCSI) driver architecture, a single driver source tree, to control all Silicon-Image’s PCI, PCI-X, PCI-Express to S-ATA Host Bus Adapters (HBA), supporting multiple Operating Systems (WINDOWS, LINUX, Netware, MAC OS X), on multiple platforms (32-64 bit architectures of Intel, AMD, and Power PC). This resulted in great cost reduction of software developing, testing; alleviated product releases managing. This architecture helped to re-use driver codes and software components for different platforms and OS, shorten software developing cycles. Products: Sil3112, Sil3114, Sil3124, Sil3131, Sil3132. · More ... Original FW developer of the SATA port-multiplier named SiliconImage-SteelVine.
Software architecture/leader of PDC's Parallel ATA/Serial ATA RAID (0, 1, 10) storage controller products. Managed daily activities and leaded a group of four software engineers to successfully delivered the beta shipment on tight schedule. LINUX/FreeBSD Device Driver Developer for PDC's Universal Host Bus Adapter (UHBA) to control Parallel ATA/Serial ATA drives. Successfully delivered high performance, high reliability RAID SCSI subsystem device drivers. Improvised software solutions/workarounds to hardware problems. With strong understanding of hardware and integration issues, I have been contributing a number of algorithms and methods to both hardware and software to roll out a high performance, high level of flexibility SATA RAID storage controller products. Technical lead of PDCs software group and offshore-software-group (SPSoft Inc. in India)
Kernel/System Manager/FiberChannel Device Drivers developer for 3PARdata network storage system. I was employee 34th. • Created detailed algorithm to eliminate data inconsistencies, prevent data loss, and perform data recovery under uncontrolled shutdowns, along with data storage managment features. This resulted in successful customer critical quality tests for Beta and General Availability shipments. • Established and deployed infrastructure for communication interface protocols, such as between BIOS and 3PAR controller node kernel software using SMI, and I2C protocols; between User and Kernel of LINUX OS using IOCTL; between clients and the 3PAR storage box using TCP/IP with buffer size change on demand; between the JBOD chassis and the System Manager software. • Designed and rolled out new LINUX kernel services contain monitoring and reporting the system‘s environment, logging and enabling proactive and predictive customer support actions;
Server developer for Accrue Insight product: Developed and implemented DAS (Database Access Servlet) for Accrue Insight product. Successfully provided a set of Application Protocol Interface (API) that allowed UI layer to connect to Accrue Warehouse. Restructured Accrue's data Analyzer component, a core of Accrue Insight product that parsed and transformed web data to database schema-formats automatically. Improved loading data rate into Accrue Warehouse (ORACLE / REDBRICK Databases) by 20% by optimizing the loader component. Established a security protocol using MD5 and RC4 algorithms to safely transfer data from Collectors (remote web data collectors) to Accrue Warehouse. Setup entire environment to facilitate Accrue Insight product release. Determined product-build cycles.