Lead Data Engineer Resume California - Hire IT People

SUMMARY

9+ years of hands on experience in Big Data Technologies, Hadoop Ecosystem including Hive, Spark, NiFi, Sqoop, Flume, Oozie, Kafka, Hbase, Lucidworks SOLR, Map Reduce, AWS and Java.
Experience in building data pipelines with Kafka, Spark Streaming, and Hbase.
Hands on experience in migrating flume java interceptors to Spark framework.
Hands on experience in developing an application with Eventhubs, ADLS, Redis Cache with ADF pipeline.
Hands on experience in developing standalone SOLR applications to update the data fields.
Developed data tracer application to analyze the gap from source ingestion to destination at different stages in the pipeline.
Hands on experience in migrating the solr, hbase data from hdp 2.5.6 to hdp 2.6.5.
Experience in developing Spark applications in Scala for data extraction and analyzing.
Experience in analyzing data using HiveQL and custom Map Reduce programs in Java.
Experienced on loading large sets of structured, semi - structured, and unstructured data and performed importing and exporting data into HDFS and Hive using Sqoop.
Experience in developing data ingestion and work-flow management scripts using NiFi.
Hands on experience in working with NiFi to load the data from multiple sources directly into HDFS.
Worked on big data projects such as Streamline analytics and Data consolidation.
Worked on real time data integration using Kafka, Spark and Cassandra.
Proficient in writing spark applications using Scala, and Java programming.
Experience in developing HIVE UDFs, and Spark UDFs.
Have good knowledge in NoSQL databases like Hbase, MemSQL and Cassandra.
Good Understanding and working knowledge on Agile Software development process. Sound understanding of Agile tools like Jiraand Octane.
Have good knowledge on ETL and BI tools like Tableau and Cognos.
Hands on experience in monitoring and maintaining Hadoop clusters.
Hands on Experience with Amazon EMR.
Hands on experience in building Hortonworks Data Platform clusters with 2.5.x, 2.6.x
Working on Cloudera Data Platform POC, to migrate hdp 2.6.5 spark applications to CDP environment.
Proficient in developing web based applications and client server distributed architecture applications in Java/J2EE technologies using Object Oriented Methodology.
Strong Knowledge on full Software Development life cycle-Software analysis, design, architecture, development, and maintenance.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Hive, Spark, NiFi, Map Reduce, YARN, Sqoop, Oozie, Zookeeper, Lucidworks Solr, and Flume.

Scripting Languages: Shell, Scala, and Python.

Programming Languages: Java, and SQL.

Web Services: AWS EC2, S3, EMR, Dynamo DB, SOAP, and Rest.

Databases: SQL Server, Oracle, and MySQL.

DW & BI Tools: DataStage, Cognos, and Tableau.

NoSQL Databases: Hbase, MemSQL, Redis Cache, Cosmos DB, and Cassandra.

Hadoop Environments: HDP 2.5.x, HDP 2.6.x, and CDP.

Tools: Eclipse, Intellij, JBuilder and GIT Lab.

Operating Systems: Mesos DCOS, Linux, UNIX, MAC, Windows 7, Windows 8.

PROFESSIONAL EXPERIENCE

Confidential, California

Lead Data Engineer

Responsibilities:

Involved in Data ingestion, Data processing phases.
Lead the design and engineering of the application development; accountability for the implementation and production roll out of the solutions.
Developed on building the legacy data pipelines such as flume kafka integration along with Lucidworks SOLR.
Worked on migrating flume java interceptor logic to Spark streaming integration with kafka.
Developed data pipelines with Spark Streaming, Kafka, Lucidworks SOLR.
Developed spark application to stream the processed data to SOLR with solrj client.
Developed on additions components like HBASE integration to the existing data pipelines to persist the data in HBASE tables after spark processing in a Kerberized HDP 2.6.x / 2.5.x environments.
Hands on experience in developing the data tracer application to analyze the gap from source ingestion (kafka topics) to destination (SOLR, HDFS) at different stages in the pipeline.
Developed spark streaming application to consume messages from Kafka topic, process the data from hdfs using spark dataframes and persist the dataframe to Hbase tables.
Developed Spark preprocessor application for one of the Lam specific data, to integrate with existing data pipeline without breaking the flow.
Working on the hdfs data migration to ADLS storage.
Working on the existing data pipeline to migrate HDFS storage to ADLS using JAVA ADLS API.
Have POC experience on migrating applications from current HDP 2.6.x platform to CDP platform.
Currently working on HDInsight migration project from HDP 2.6.x platform.
Utilized Agile Scrum Methodology to help manage and organize a team of four developers with regular code review sessions.
Designed, documented operational problems by following standards and procedures using a software-reporting tool Confluence & JIRA.
Optimize and tune the Hadoop environments to meet performance requirements.
Performance analysis and debugging of slow running development and production processes.
Assist with admin and support team to maintain the high level and low-level technical documentations.

Confidential, Chicago

Hadoop/Spark Lead Developer

Responsibilities:

Involved in Data ingestion, Data processing and reporting phases.
Lead the design and engineering of the application development; accountability for the implementation and production roll out of the solutions.
Developed NIFI workflow scripts to pull the data from vendor data sources.
Developed spark application in Scala for input files extraction based on its respective schema.
Developed custom spark application for common processing functionalities.
Developed custom NiFi framework to automate the ingestion mechanism with minimal effort.
Developed spark application to read MemSQL DB data into Spark dataframes.
Worked on data ingestion automation from multiple data sources into Hadoop Distributed File System.
Developed business logic using Scala, Spark and HIVE.
Implemented factory design pattern to handle multiple vendors’ data enrichment specific to each vendor.
Developed custom spark jobs to move large datasets among different platforms like SQL Server to Hadoop and MemSQL to Hadoop vice versa.
Developed selenium automation jobs for downloading data from vendor portals.
Developed custom NiFi processor to download the attachment from Confidential email box.
Developed email notification and scheduling jobs in java for selenium automation jobs in Windows server
Developed Spark SQL scripts for implementing data transformations.
Developed hive, SQL scripts to load data into hive external tables and for view creations to be consumed by Tableau visualization tool.
Developed Date, Currency type conversions, and Sales Spark UDF’s to reduce the load on Tableau.

Confidential, California

Hadoop Developer

Responsibilities:

Worked on Streamline analytics and data consolidation projects on the product Connect Home.
Integrated Kafka, Spark and Cassandra for streamline analytics.
Worked on ingesting, reconciling, compacting, and purging base table and incremental table data using Hive and job scheduling through Oozie.
Utilized DevOps principle components to ensure operational excellence before deploying in production.
Operating the cluster on AWS by using EC2, EMR, S3, and cloud watch.
Import structural data using Sqoop to load data from MySQL, Oracle to HDFS and vice versa on regular basis.
Developed Scripts and Batch Jobs to schedule various Hadoop Program using Oozie.
Implemented Spark streaming on all kinds of data using most optimized and performance tuning techniques.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Optimized Hive queries using compact and bitmap indexing for quick look up inside tables.
Created 50 buckets for each Hive ORC table based on clustering by client Id for better performance (optimization) while updating the tables.
Used Java UDFs for performance tuning in Hive and Pig by manually driving the MR part.
Written Spark programs to model data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV& other compressed file formats.
Developed Pig Latin scripts to extract data from web server output file to load into HDFS.
Developed Pig UDFs to pre-process data for analysis.
DevelopedSparkcode using Scala andSpark-SQL/Streaming for faster testing and processing of data.
UsedSparkAPI over Cloudera Hadoop YARN to perform analytics on data in Hive.
Utilized Agile Scrum Methodology to help manage and organize a team of four developers with regular code review sessions.
Prepared developer (unit) test cases and executed developer testing.
Designed, documented operational problems by following standards and procedures using a software-reporting tool JIRA.

Confidential

Hadoop/Java Developer

Responsibilities:

Responsible for managing data from multiple sources.
Involved in the migration part of the project for 30 different sources.
Worked with IBM data extraction application Data stage for ETL purpose to get the data on Edge Node.
Developed CRON jobs to write the input data files to HDFS location and Archive location.
Developed Map Reduce applications for the schema validation and Row Count Validation.
Assisted in exporting analyzed data to relational databases using Sqoop.
Experienced in importing and exporting data into HDFS and assisted in exporting analyzed data to RDBMS using SQOOP.
Once Schema and Row Count Validation is done, written MR jobs to create Avro Schema.
Developed Applications to convert .dat files to Avro data format.
Written MR jobs to create super set schema from different Avro schemas.
Hive tables have been created from the super set schema.
Participated in white board sessions to get the task requirements.

We provide IT Staff Augmentation Services!

Lead Data Engineer Resume

CaliforniA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship