We provide IT Staff Augmentation Services!

Lead Data Engineer Resume

4.00/5 (Submit Your Rating)

CaliforniA

SUMMARY

  • 9+ years of hands on experience in Big Data Technologies, Hadoop Ecosystem including Hive, Spark, NiFi, Sqoop, Flume, Oozie, Kafka, Hbase, Lucidworks SOLR, Map Reduce, AWS and Java.
  • Experience in building data pipelines with Kafka, Spark Streaming, and Hbase.
  • Hands on experience in migrating flume java interceptors to Spark framework.
  • Hands on experience in developing an application with Eventhubs, ADLS, Redis Cache with ADF pipeline.
  • Hands on experience in developing standalone SOLR applications to update the data fields.
  • Developed data tracer application to analyze the gap from source ingestion to destination at different stages in the pipeline.
  • Hands on experience in migrating the solr, hbase data from hdp 2.5.6 to hdp 2.6.5.
  • Experience in developing Spark applications in Scala for data extraction and analyzing.
  • Experience in analyzing data using HiveQL and custom Map Reduce programs in Java.
  • Experienced on loading large sets of structured, semi - structured, and unstructured data and performed importing and exporting data into HDFS and Hive using Sqoop.
  • Experience in developing data ingestion and work-flow management scripts using NiFi.
  • Hands on experience in working with NiFi to load the data from multiple sources directly into HDFS.
  • Worked on big data projects such as Streamline analytics and Data consolidation.
  • Worked on real time data integration using Kafka, Spark and Cassandra.
  • Proficient in writing spark applications using Scala, and Java programming.
  • Experience in developing HIVE UDFs, and Spark UDFs.
  • Have good knowledge in NoSQL databases like Hbase, MemSQL and Cassandra.
  • Good Understanding and working knowledge on Agile Software development process. Sound understanding of Agile tools like Jiraand Octane.
  • Have good knowledge on ETL and BI tools like Tableau and Cognos.
  • Hands on experience in monitoring and maintaining Hadoop clusters.
  • Hands on Experience with Amazon EMR.
  • Hands on experience in building Hortonworks Data Platform clusters with 2.5.x, 2.6.x
  • Working on Cloudera Data Platform POC, to migrate hdp 2.6.5 spark applications to CDP environment.
  • Proficient in developing web based applications and client server distributed architecture applications in Java/J2EE technologies using Object Oriented Methodology.
  • Strong Knowledge on full Software Development life cycle-Software analysis, design, architecture, development, and maintenance.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Hive, Spark, NiFi, Map Reduce, YARN, Sqoop, Oozie, Zookeeper, Lucidworks Solr, and Flume.

Scripting Languages: Shell, Scala, and Python.

Programming Languages: Java, and SQL.

Web Services: AWS EC2, S3, EMR, Dynamo DB, SOAP, and Rest.

Databases: SQL Server, Oracle, and MySQL.

DW & BI Tools: DataStage, Cognos, and Tableau.

NoSQL Databases: Hbase, MemSQL, Redis Cache, Cosmos DB, and Cassandra.

Hadoop Environments: HDP 2.5.x, HDP 2.6.x, and CDP.

Tools: Eclipse, Intellij, JBuilder and GIT Lab.

Operating Systems: Mesos DCOS, Linux, UNIX, MAC, Windows 7, Windows 8.

PROFESSIONAL EXPERIENCE

Confidential, California

Lead Data Engineer

Responsibilities:

  • Involved in Data ingestion, Data processing phases.
  • Lead the design and engineering of the application development; accountability for the implementation and production roll out of the solutions.
  • Developed on building the legacy data pipelines such as flume kafka integration along with Lucidworks SOLR.
  • Worked on migrating flume java interceptor logic to Spark streaming integration with kafka.
  • Developed data pipelines with Spark Streaming, Kafka, Lucidworks SOLR.
  • Developed spark application to stream the processed data to SOLR with solrj client.
  • Developed on additions components like HBASE integration to the existing data pipelines to persist the data in HBASE tables after spark processing in a Kerberized HDP 2.6.x / 2.5.x environments.
  • Hands on experience in developing the data tracer application to analyze the gap from source ingestion (kafka topics) to destination (SOLR, HDFS) at different stages in the pipeline.
  • Developed spark streaming application to consume messages from Kafka topic, process the data from hdfs using spark dataframes and persist the dataframe to Hbase tables.
  • Developed Spark preprocessor application for one of the Lam specific data, to integrate with existing data pipeline without breaking the flow.
  • Working on the hdfs data migration to ADLS storage.
  • Working on the existing data pipeline to migrate HDFS storage to ADLS using JAVA ADLS API.
  • Have POC experience on migrating applications from current HDP 2.6.x platform to CDP platform.
  • Currently working on HDInsight migration project from HDP 2.6.x platform.
  • Utilized Agile Scrum Methodology to help manage and organize a team of four developers with regular code review sessions.
  • Designed, documented operational problems by following standards and procedures using a software-reporting tool Confluence & JIRA.
  • Optimize and tune the Hadoop environments to meet performance requirements.
  • Performance analysis and debugging of slow running development and production processes.
  • Assist with admin and support team to maintain the high level and low-level technical documentations.

Confidential, Chicago

Hadoop/Spark Lead Developer

Responsibilities:

  • Involved in Data ingestion, Data processing and reporting phases.
  • Lead the design and engineering of the application development; accountability for the implementation and production roll out of the solutions.
  • Developed NIFI workflow scripts to pull the data from vendor data sources.
  • Developed spark application in Scala for input files extraction based on its respective schema.
  • Developed custom spark application for common processing functionalities.
  • Developed custom NiFi framework to automate the ingestion mechanism with minimal effort.
  • Developed spark application to read MemSQL DB data into Spark dataframes.
  • Worked on data ingestion automation from multiple data sources into Hadoop Distributed File System.
  • Developed business logic using Scala, Spark and HIVE.
  • Implemented factory design pattern to handle multiple vendors’ data enrichment specific to each vendor.
  • Developed custom spark jobs to move large datasets among different platforms like SQL Server to Hadoop and MemSQL to Hadoop vice versa.
  • Developed selenium automation jobs for downloading data from vendor portals.
  • Developed custom NiFi processor to download the attachment from Confidential email box.
  • Developed email notification and scheduling jobs in java for selenium automation jobs in Windows server
  • Developed Spark SQL scripts for implementing data transformations.
  • Developed hive, SQL scripts to load data into hive external tables and for view creations to be consumed by Tableau visualization tool.
  • Developed Date, Currency type conversions, and Sales Spark UDF’s to reduce the load on Tableau.

Confidential, California

Hadoop Developer

Responsibilities:

  • Worked on Streamline analytics and data consolidation projects on the product Connect Home.
  • Integrated Kafka, Spark and Cassandra for streamline analytics.
  • Worked on ingesting, reconciling, compacting, and purging base table and incremental table data using Hive and job scheduling through Oozie.
  • Utilized DevOps principle components to ensure operational excellence before deploying in production.
  • Operating the cluster on AWS by using EC2, EMR, S3, and cloud watch.
  • Import structural data using Sqoop to load data from MySQL, Oracle to HDFS and vice versa on regular basis.
  • Developed Scripts and Batch Jobs to schedule various Hadoop Program using Oozie.
  • Implemented Spark streaming on all kinds of data using most optimized and performance tuning techniques.
  • Gathered the business requirements from the Business Partners and Subject Matter Experts.
  • Optimized Hive queries using compact and bitmap indexing for quick look up inside tables.
  • Created 50 buckets for each Hive ORC table based on clustering by client Id for better performance (optimization) while updating the tables.
  • Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
  • Written Spark programs to model data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
  • Developed Pig Latin scripts to extract data from web server output file to load into HDFS.
  • Developed Pig UDFs to pre-process data for analysis.
  • DevelopedSparkcode using Scala andSpark-SQL/Streaming for faster testing and processing of data.
  • UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Utilized Agile Scrum Methodology to help manage and organize a team of four developers with regular code review sessions.
  • Prepared developer (unit) test cases and executed developer testing.
  • Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.

Confidential

Hadoop/Java Developer

Responsibilities:

  • Responsible for managing data from multiple sources.
  • Involved in the migration part of the project for 30 different sources.
  • Worked with IBM data extraction application Data stage for ETL purpose to get the data on Edge Node.
  • Developed CRON jobs to write the input data files to HDFS location and Archive location.
  • Developed Map Reduce applications for the schema validation and Row Count Validation.
  • Assisted in exporting analyzed data to relational databases using Sqoop.
  • Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
  • Once Schema and Row Count Validation is done, written MR jobs to create Avro Schema.
  • Developed Applications to convert .dat files to Avro data format.
  • Written MR jobs to create super set schema from different Avro schemas.
  • Hive tables have been created from the super set schema.
  • Participated in white board sessions to get the task requirements.

We'd love your feedback!