Lead Data Engineer Resume Columbus, GA - Hire IT People

SUMMARY

Total 12 Years of IT Experience in Data engineering and Analytics.
Worked on distributed and cloud computing Big Data technologies like Apache Hadoop, AWS, Spark, PySpark, Oracle.
Proficient in working on programing languages like Python, Scala, Java.
Expertise in building ETL frameworks using various tools and technologies.
Worked on verities of data storage formats like Parquet, ORC, AVRO, XML, JSON, XLS, CSV, data formats like Structured, Unstructured, and Semi structured data
Having good exposer on supervised, unsupervised learning and Natural language processing (NLP) methods and mathematical and statistical methods
Developed date warehouse solutions in Hadoop using HDFS, Hive, Pig, Sqoop, HBase, Oozie, Cloudera Hue, Cloudera Manager, Scala, Spark, Python, Java, Impala, Ambari, Ranger
Developed cloud - based solutions using AWS Redshift, Glue, Lambda, Athena, S3, Spectrum, Azure data factory, Synapse, Databricks.
Proficient in other tools and technologies like oracle 11g/12c DB, SQL Server, MySQL, Netezza
Good exposer on Design, document, and implement data warehouse strategies, including building ETL, ELT, and data pipeline processes.
Collaborated with different stakeholders like product managers, architects, Analytics, and project managers to deliver solutions.
Having complete picture of software lifecycle management and following best practices throughout.
Supervise team activities including work scheduling, technical direction, and standard development practices.
Believing in continues improvement, developed a multiple framework to improve the efficiency of team across my carrier.

TECHNICAL SKILLS

Big Data Ecosystems: Apache Hadoop, MapReduce, Spark, HDFS, HBase, Hive, Pig, Sqoop, Oozie, Kafka

Cloud Ecosystems: AWS EC2, Redshift, Glue, Lambda, Athena, S3, EMR, Spectrum, Azure data factory, Synapse, Databricks

Languages: Python, Scala, Java, PL/SQL

Machine learning: Scikit-learn, Pandas, Matplotlib, NumPy, NLTK

Databases: Oracle, Netezza, RedShift, SQL Server, MySQL

NoSQL Database: HBase, Elastic Search

Operating System: Windows, Red hat Linux

Tools: Used: Spyder, PyCharm, Toad, IntelliJ, Anaconda, Atlan Data Catalog

Streaming tools (Real time): Kafka, Nifi

Version Controls: SVN, TFS, Mercurial, Bitbucket, Git

Data Processing: Structured, Unstructured, and Semi structured data

PROFESSIONAL EXPERIENCE

Confidential, Columbus, GA

Lead Data Engineer

Responsibilities:

As a Primary resource understood complete project and data architecture and worked closely with client team to improve the data collection, quality, reporting and analytics capabilities.
Worked on AWS Glue, data Catalog, S3, Athena, Lambda to build the data pipelines.
Worked on Azure Blob, Data Factory, Synapse, and Azure Databricks to build the ETL data pipelines.
Understand the complexity of existing system proposed the new solutions to run the pipeline efficiently.
Optimized existing pipelines, resulted in reduced operational cost, scalability, and usability.

Technologies: AWS Glue, data Catalog, S3, Athena, Lambda, MySQL Database, Azure Blob, Data Factory, Synapse, and Azure Databricks, Python, PySpark, Bitbucket, Spider, Atlan Data Catalog

Confidential, Burlington, MA

Sr Specialist

Responsibilities:

Played the role of Lead engineer and Analyst, designed end to end data lake architecture and helped in analysis of flight risk and learning models of employees.
Proposed, Designed and Developed a Generic plug and play ETL framework on PySpark.
Which will enable developers to configure and run any new pipeline in a quick and efficient way.
This reduces the development efforts by more than 50%.
Helped team in development of Python base Data Validation utility to detect data related issue in early stage of the pipeline.
Perform a variety of tasks to facilitate completion of projects including coordinating with different teams, helping team in complex production related issue.
Working on strategizing efficient migration from Hadoop to AWS cloud and Palantire platforms.
Helped team in migrating the data from Hadoop to AWS and Palantire platform in efficient manner.

Technologies: Hadoop, PySpark, HDFS, Hive, Sqoop, Oozie, Bitbucket, PyCharm, AWS Glue, Lambda, Athena, S3, EMR, Spectrum, Elastic Search, Scikit-learn, Pandas, NumPy, NLTK, Data modeling, DataIku, Palantir, Ambari

Confidential

Sr Data Engineer

Responsibilities:

Helped in migration from legacy DataStage pipelines to Hadoop, Spark ecosystem efficiently with help of team.
Emphasized the importance of file monitoring framework for all non-Hadoop pipelines and implemented using python, it saves a weekly 5Hr of operations time.
Built a customer sessionization logic on a Coupons website clickstream data using PySpark, which eliminated the use of HBase cluster.
Developed a configurable generic PySpark utility to generate XML reports by reading a JSON file, which suits different clients based on configuration.

Technologies: Hadoop, PySpark, HDFS, Hive, Sqoop, Oozie, Mercurial, PyCharm, Python, Pandas, Apache Solr, Unix Shell scripting, Cloudera Manager

Confidential

Sr Database Engineer

Responsibilities:

Provided excellent leadership by recommending the right technologies and solutions for a given use case.
Provided technical support to resolve or assist in resolution of issues relating to production systems.
Involved in requirement gathering and designing end to end project architecture in Hadoop.
Built and automated a report using MS SSIS that reduced a weekly 2 Hrs. of manual efforts.
Involved in migrating data pipelines from Netezza to AWS environment.
Initiated multiple process improvement activities.

Technologies: Hadoop, Spark, Scala, HDFS, Hive, Pig, HBase, Sqoop, Oozie, SVN, IntelliJ, Unix Shell scripting, Data modeling, AWS Glue, S3, Redshift. MS SSIS, SQL Server

Confidential, Danbury, CT

Data Engineer

Responsibilities:

Built data Lake in Hadoop, migrating pipeline and data from Netezza to Hive and pig scripts.
Worked on top revenue generating analytical project. It analyzes trends of the pharmaceutical product and market segments.
Built a cost-effective generic pipeline and data model that accommodates multiple client reports in single data platform.
Worked on performance improvement of data pipelines and awarded by out of the box thinker award.
Provided excellent leadership by recommending the right technologies and solutions for a given use case.
Designed best practices to support continuous process automation for data ingestion and data pipeline workflows.
Prepared and presented reports, analysis and presentations to various stakeholders including executives.

Technologies: Oracle PLSQL, Netezza, Hadoop, Python, HDFS, Hive, Pig, Sqoop, Unix Shell scripting, Data modeling, Toad, AWS Redshift, S3

Confidential

Software Engineer

Responsibilities:

Involved in development of in-house tool called Mech bench to provide access to all Honeywell and external employees.
Got very good exposer on OLTP systems and processes.
Built efficient logic on backend to handle heavy data activities.
Worked on advanced concepts line arrays, triggers, materialized views and temporary table.
Worked on performance tuning and code refectory of multiple projects.

Technologies: Oracle PLSQL, Toad, SVN, Data modeling, Unix Shell scripting, Oracle Database administration.

We provide IT Staff Augmentation Services!

Lead Data Engineer Resume

Columbus, GA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship