Lead Engineer Resume
Plano, TX
SUMMARY
- Over 13 years of professional IT experience including Specification, Design, Implementation, Debugging, Testing and Deployment of complex software applications using BIG DATA (Spark/Kafka/Hive/Hadoop/ SnowFlake), Scala, J2EE, C++, . Having excellent Technical, Problem Solving, Client Interaction and Management Skills.
- 6 years of work experience as a Big Data Technical Lead and distributed computing with sound knowledge in Ingestion (Flume, Kafka, Sqoop, NIFI), Storage (HDFS, HBase, SnowFlake, Cassandra,Dynamodb), Querying (Hive, Pig, SparkSql), Processing (Map - Reduce, Spark, NiFi), Machine Learning (Spark ML) and Visualization(Zeppelin, Tableau)
- Strong Experience in developing and setting up 24/7 high performing production streaming application using Spark Streaming and Kafka.
- Experience in Spark Core, Spark Sql, Spark Streaming, Data Frames, RDD's, Scala for Spark
- Experience in using DStreams, Accumulator variables, Broadcast variables, RDD caching for Spark Streaming
- Experience in writing queries for moving data from HDFS to Hive and analyzing data using Hive-QL
- Strong experience in developing running Spark applications on AWS EMR
- Strong experience in AWS cloud services like EMR, EC2, Kenisis, Dynamdb, Lambda, S3, Cloudwatch, Stepfunctions, VPC, SES, SNS, Cloudwatch which provides fast and efficient processing of Big Data.
- Strong experience in Data analytics on big data using Scala, MLib.
- Strong experience in Databases like MySQL, and No Sql DB Redis, HBase, Cassandra.
- Experience in Data Visualization with MS-Excel, Tableau, Zeppelin.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice versa
- Knowledge of job workflow scheduling and monitoring tools like AirFlow, Oozie and Zookeeper.
- Experience in developing and implementing SOAP and RESTful Web Services that will integrate with multiple applications.
- Experience in writing Terraform, Shell Scripts, Map Reduce jobs, Spark jobs to perform ETL operations
- Experience in automating build (Maven, sbt) and development tasks using Shell Scripts, and Python.
- Experience in middle-tier development using J2EE technologies like JSP, Servlets, EJB, JDBC, JPA, JMS, Struts, Spring, JAXB, JAX-WS and JAX-RS.
- Experience in collaborating with different overseas teams (USA, Europe, Australia and China) to meet delivery deadlines.
- Strong experience in Programing languages C, C++, QT .
TECHNICAL SKILLS
Programming Languages: Java, Scala, Python, C++, C
Big Data Platform Distribution: Huawei Fusion Insight, Cloudera, HDP, EMR
Hadoop Eco-system: Pig, Hive, HBase, Sqoop, Flume, Zookeeper, Oozie, Hue, Yarn, NiFi
Big Data Frameworks: Hadoop, Solr, Spark, Kafka
J2EE Technologies: Servlets, JSP, JSTL, EJB, JDBC, JMS, JNDI, RMI, JAX-WS, JAX-RS, Log4J
Cloud Services: Amazon Web Services, Huawei Cloud
Application Security: SOAP and Restful Web Services
Scripting Languages: Terraform, Pig, ShellScript, sed
Data mining/ML Tools: Python, MLlib.
Reporting and Visualization Tools: MsExcel, Zeppelin, Tableau
Schedulers: Airflow, Oozie, TCC (Huawei Internal Tool)
Message Queues/Brokers: Kafka, NiFi, Flume
Packaging/Deployment: Maven, Gradle, Ant, SBT
Version Control tools: Tortoise SVN, Git, GitHub
Databases: HBase, Cassandra, Dynamodb, SnowFlake, MSSQL Server, MySQL, SQLITE, Huawei MPP DB
Application/Web Servers: Tomcat
Operating Systems: Windows 7/8, LINUX, UBUNTU, Android, MAC OS X
IDE Tools: IntelliJ IDEA, Eclipse, Scala IDE, QTCreator, Visual Studio
Testing Frameworks/Tools: Junit, MRUnit, JMeter, ScalaTest
Desktop Application Framework: QT, Visual Studio MFC
Web Designing Tools: JSP
DevOps Tools: Jenkins, Git, Ansible
Agile Project Management, Quality Processes and Defect Tracking Tools: JIRA, Rally, Six Sigma Green Belt, Huawei DTS Tool
Installer Tools: NSIS, MSI
PROFESSIONAL EXPERIENCE
Confidential, Plano, TX
Lead EngineerTools: Scala, Kafka, Spark, hdfs, Snowflake, EMR, S3, Lambda and other AWS services, Maven, looker, dynamodb, Airflow, Terraform, Hbase .
Responsibilities:
- Gathered business requirements from business partners and ensure successful delivery .
- Designed xref graph data schema.
- Designed and developed spark scala application to build data models and to run on aws emr clusters efficiently.
- Processed multiple Kenisis streams using Spark Scala.
- Developed Real time streaming application using Spark Streaming with Scala.
- Worked on Dynamodb to keep the reference data.
- Involved in spark application tuning on production environment.
- Developed apache airflow workflow to automate the different data ingestions to integrated graph
- Build Snowflake database system, tables and loaded data using snow pipe .
- Actively involved in building Jenkins CICD pipeline to automate the build and deployment on aws
Confidential, Frankline, TN
Tools: Scala, NiFi, Kafka, Spark SQL, Spark Structured Streaming, Hadoop, Maven, HDP, Zeppelin, Cassandra.
Responsibilities:
- Designed Schema based Data Lakes/Data Stores.
- Designed Spark Streaming application end to end solution from data injection to Visualization
- Developed NiFi work flows for data ingestion to Kafka topics.
- Processed data on Nifi by applying transformations
- Developed Real time streaming application using Spark Structured streaming with Scala.
- Analyzed data and developed data model using Spark / Scala
- Design Hbase tables and developed integration of Spark with Cassandra.
- Involved in spark application tuning on production environment.
- Developed dashboard for data visualization using zeppelin.
Confidential, Santa Clara, California
Tools: Scala, Java / J2EE, Flume, Kafka, Spark SQL, Spark Streaming, Spark ML, Hive, Hadoop, Web services, Tomcat, Maven, Huawei Fusion Insight(Customized Cloudera / HDP)
Responsibilities:
- Designed application to ingested data from Huawei channel networks like Vmall, Music, Video, App Store, Ad SDK and 3rd party data.
- Developed data partitions for the data ingestion to improve data processing speed.
- Flume deployment to efficiently collect the real-time data.
- Processed multiple kafka topic using Scala.
- Processed Real time data using Spark streaming (DStreams) and Scala.
- Analyzed data and developed data model using Spark / Scala
- Implemented Hive scripts for analyzing batch data for Tag and Audience generation.
- Developed web services to provide API’s for DSP / SSP systems.
- Implemented shell scripts data export data from Hive to MySql for DSP / SSP platform queries.
Confidential
Tools: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Python, HBASE, OOZIE, yarn, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, MySQL, Linux, AWS, XML, SVN, Putty Spark Scala
Responsibilities:
- Wrote the Map reduce job to write the Full history Public sourcing documents (3 billion documents) to hbase and Elastic Search.
- Loaded history Data as well as incremental customer and other data to Hadoop through Hive.Applied the required Business logic to the data in hive and generated the required output in the form of Flat file.
- Wrote the Json API to retrieve the public sourcing documents from Elastic Search.
- Worked on building Big Data infrastructure stack include Elastic Search, Hadoop Stack (HDFS, Map Reduce .HBase, Zoo keeper), and private data cloud.
- Processed multiple kafka topic using Scala.
- Processed Real time data using Spark streaming (DStreams) and Scala.
- Analyzed data and developed data model using Spark / Scala
- Involved in data modeling in Hadoop.
- Processed multiple kafka topic using Scala.
- Implemented Kafa Producer application.
- Develop oozie schedules for batch execution
- Develop shell script to calculate the parameters needed for oozie flow and start the batch execution
- Moving data from Hadoop/Hive to Elasticsearch
Confidential
Tools: HADOOP, HDFS, MAPREDUCE, HIVE, PIG, Python, HBASE, OOZIE, yarn, Core Java, Oracle, SQL, UBUNTU/UNIX, eclipse, Maven, JDBC drivers, Mainframe, MySQL, Linux, AWS, XML, CRM, SVN, PDSH, Putty, BigInsights
Responsibilities:
- Create the project using HIVE, BIGSQL, PIG
- Implemented Partitioning, Bucketing in HIVE.
- Involved in data modeling in Hadoop.
- Creating Hive tables and working on them using Hiveql.
- Written Apache PIG scripts to process the HDFS data.
- Created Java UDFs in PIG and HIVE.
- Designed end to end ETL work flow using Hadoop.
- Involved in data modeling in Hadoop.
- Participated in backup and recovery of Hadoop file system.
- Automated tasks using UNIX shell scripts.
- Requirement Analysis & Prepares solutions for each requirement
- Gathered the business requirements from the Business Partners and Subject Matter Experts.
Confidential
Tools: Java, HDFS, Map Reduce, Hive, Huawei Fusion Insight (Customized Cloudera / HDP), MS-Excel, MySql, Fine Reports
Responsibilities:
- The platform supports batch data processing on the production cluster which is a 320 node Cluster with a daily data ingest size of 35 TB processing 200 billion records on a typical day Responsible for building scalable distributed data solutions using Hadoop.
- Installed and configured Hive, HDFS, Zookeper, and MapReduce on the Hadoop cluster.
- Developed tool to securely transfer batch data from different application servers to HDFS for analysis.
- Implemented reports using MS Excel, Fine Report Tool. Processed JSON, CSV, parquet data formats.
- Implemented Hive scripts and MapReduce programs based on the business requirements to process different system data. Implemented custom UDF to support business and meet the data security requirements.
- Implemented Scripts to export data from MySql (Application Sever) - Hive - MySQL (Business Servers). Implemented scheduler framework to schedule the Batch scripts for daily data ingestion, processing in production environment.
Confidential
Tools: C++, QT, MySQL
Responsibilities:
- Design, Coding, Testing.
- QT/C++ for showing the different status of the device detection in data cards (modem)
- Training new members in team.
- Acted as module lead for multiple modules in the team.
- Custom QT Compilation to reduce the lib size.
- CI maintenance for the agile development. Automating the Build and release procedure.
Confidential
Tools: C++, QT, MySQL
Responsibilities:
- Involved in Design of GUI.
- Involved in development of GUI and core application using C++, QT4 integrated with MSVS2005.
- Maintaining Database using SQLite to store the entire configured data and plotting the data acquired on RS232