We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

5.00/5 (Submit Your Rating)

Irving, TX

SUMMARY:

  • Overall 9 years of experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects. Around 7+ years of experience in Big Data in implementing end - to-end Hadoop solutions.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, ZOOKEEPER, SQOOP, Kafka, Storm.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, Pig and Solr.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
  • Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
  • Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Hands on experience on configuring a Hadoop cluster in an enterprise environment and on VMWare and Amazon Web Services ( AWS ) using an EC2 instances.
  • Experience configuring and working on AWS EMR Instances.
  • Proficient in using Cloudera Manager, an end-to-end tool to manage Hadoop operations.
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Usage of different Talend Hadoop Component like Hive, Pig and Sqoop.
  • Worked on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
  • Good understanding of NoSQL databases like Cassandra and HBase.
  • Experience on developing REST Services.
  • Written Apache Spark streaming API on Big Data distribution in the active cluster environment.
  • Expertise in core Java, J2EE, Multithreading, JDBC, spring, Shell Scripting and proficient in using Java API’s for application development.
  • Experience in Scrapping the Data Using Kapow and Blue Prism.
  • Experience in cloud data migration using AWS and Snowflake.
  • Used Teradata SQL Assistant to view HIVE tables.
  • Installed and configured apache airflow for workflow management and created workflows in python.
  • Experience in Database design, Entity relationships, Database analysis, Programming SQL, Stored procedure’s PL/ SQL.
  • Good experience with continuous Integration of application using Jenkins.
  • Experienced in Splunk, ELK (Elastic, Logstash, and Kibana) for centralized logging and then store logs and metrics into an S3 bucket using Lambda function and Used AWS Lambda to manage the servers and run the code in the AWS.
  • Experience coding and testing the Crawlers, Standardization, Normalization, Load, Extract and AVRO models to filter/massage the data and its validation.
  • Excellent domain knowledge in Insurance and Finance. Excellent interpersonal, analytical, verbal and written communications skills.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Worked on various operating systems like UNIX/Linux, MAC OS and Windows.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, YARN, Pig, Hive, HBase, Impala, Kafka, Storm, Sqoop, Spark Streaming, Spark SQL, NiFi, Oozie, Zookeeper, Amazon EC2, S3, Airflow.

Cloud: AWS.

Programming languages: Java, Scala, Python, Linux Shell Script.

Scrapping Tools: Kapow, Blue Prism.

Search Engine: ELK, Splunk.

IDE Tools: Eclipse, IntelliJ IDEA.

Web Frameworks: Spring, Hibernate.

Database: Oracle, MySQL, HBase, Cassandra, Snowflake.

Web Technologies: HTML, XML, JavaScript.

EXPERIENCE:

Confidential, Irving, TX

Sr. Data Engineer

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Used EMR distribution for Hadoop ecosystem.
  • Worked on customized UDF's in Spark for eliminating duplicate columns, identifying alphanumeric variables.
  • Worked on developing new application from scratch and worked on writing code for parsing data, data cleansing & aggregation data as per our business requirement.
  • Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
  • Configured Airflow DAG for various feeds.
  • Experience in integration testing, unit testing, functional testing and performance testing.
  • Working knowledge and experience of AWS products and services AMAZON EC2, AWS Lambda, Amazon S3, Glacier, Storage Gateway, Elastic Load Balancing, VPC, IAM, RDS, CloudFront, Terraform, CloudWatch and Auto Scaling Services AMAZON MySQL/DynamoDB and CLI tools.
  • Created S3 buckets in the AWS environment to store files, sometimes which are required to serve static content for a web application.
  • Created different elastic search queries and python scripts to analyze the data from different applications and run it through Logstash, pass it through ElasticSearch and visualized them in Kibana depending on the different kinds of logs and database tables.
  • Hands on experience deploying code to production and performing historical loads.
  • Integrated AWS Lambda for transferring data from S3 to AWS ElasticSearch.
  • Automated the code deployment process using Jenkins.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML.
  • Developed PySpark for regular expression (regex) project in the Hadoop/Hive environment with Linux/Windows for big data resources.
  • Data sources are extracted, transformed and loaded to generate CSV data files with Python programming and SQL queries.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Worked on the Snowflake database setup for organization level.
  • Setup multiple Snowflake Pipelines and Warehouse for various data consumption.
  • Setup data reader accounts and Micro-Partition tuning across the Snowflake database for performance tuning.
  • Involved in creating snow sql to extract data from S3 buckets to Snowflake tables and transforming the data according to business requirements.
  • Recreated existing SQL Server objects in Snowflake .
  • Orchestrating data pipeline executions using Apache Airflow.

Environment: HDFS, Spark, Spark SQL, Git, Python, AWS, Kafka, Cassandra, Hive, Scala, UNIX Shell Scripting, Airflow, Snowflake.

Confidential, Morrisville, NC

Sr. Data Engineer

Responsibilities:

  • Responsible for designing and implementing End to End data pipeline using Big Data tools including HDFS, Hive, Sqoop, HBase, Kafka & Spark.
  • Extracting, Parsing, Cleaning and ingesting the incoming web feed data and server logs into the HDFS by handling structured and unstructured data.
  • Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
  • Good experience in writing Spark applications using Scala.
  • Used Scala SBT to develop Scala coded spark projects and executed using spark-submit.
  • Used Sqoop to extract and load incremental and non-incremental data from RDBMS systems into Hadoop.
  • Worked on data cleaning and reshaping, generated segmented subsets using Numpy and Pandas in Python
  • Used Python scripts to update the content in database and manipulate files.
  • Involved in Python OOP code for quality, logging, monitoring, and debugging code optimization. · Developed tools to automate some base tasks using Python, Shell scripting and XML.
  • Used scrapping tools like Kapow and Blue Prism to scrap the data from Third party website and store the data in S3.
  • Implemented Partitioning, Dynamic Partitioning, Buckets in HIVE.
  • Developed spark applications for data transformations and loading into HDFS using RDD, DataFrames and Datasets.
  • Developed shell scripts for ingesting the data to HDFS and partitioned the data over Hive.
  • Extensively used Hive optimization techniques like partitioning, bucketing, Map Join and parallel execution.
  • Monitored workload, job performance and capacity planning using Cloudera Manager.
  • Created Data frames using SparkSQL and worked on loading the data into Cassandra.
  • Worked on SparkSQL and created data warehouse for both Spark and hive.
  • Created view table in Impala on the same Hive table for querying the data.
  • Used Spark SQL to process the huge amount of structured data.
  • Developed Spark streaming application to pull data from cloud to Hive table.
  • Experienced with batch processing of Data sources using Apache Spar k.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Implemented rapid-provisioning and life-cycle management for using Amazon EC2 and custom Bash scripts.

Environment: Hadoop (CDH 5.12), HDFS, Spark, Spark SQL, Git, AWS, Kafka, Cassandra, Hive, Java, Scala, HBase, Maven, UNIX Shell Scripting, Kapow, Blue Prism.

Confidential, Memphis, TN

Data Engineer

Responsibilities:

  • Designed solution for Streaming data applications using Apache Storm.
  • Extensively worked on Kafka and Storm integration to score the PMML (Predictive Model Markup Language) Models.
  • Applied transformation Using Spark on the dataset.
  • Created HBase tables to store data depending on column families.
  • Extensively written Hive queries for data analysis to meet the business requirement.
  • Involved in adding and decommissioning the data nodes.
  • Responsible for analyzing using Spark SQL queries result with Hive queries.
  • Involved in requirement and design phase to implementing real time streaming using Kafka and Storm.
  • Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
  • Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
  • Used Scala SBT to develop Scala coded spark projects and executed using spark-submit.
  • Using the Maven for the deployments and Processed structured, semi structured such as XML and unstructured data as well.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Implemented rapid-provisioning and life-cycle management for using Amazon EC2 and custom Bash scripts.
  • Experience in developing/consuming Web Services (REST, SOAP, JSON) and APIs (Service-oriented architectures).
  • Used the Spark -Cassandra Connector to load data to and from Cassandra.

Environment: Hadoop (HDP 2.5.3), HDFS, Spark, Spark SQL, Git, Storm, AWS, Kafka. Hive, Java, Scala, HBase, Maven, UNIX Shell Scripting, Ranger, Kerberos, NiFi, Cassandra, REST API.

Confidential, Cincinnati, OH

Hadoop Support/Developer

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS, developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Wrote MapReduce jobs to discover trends in data usage by users.
  • Involved in managing and reviewing Hadoop log files.
  • Involved in running Hadoop streaming jobs to process terabytes of text data.
  • Load and transform large sets of structured, semi structured and unstructured data with map reduce and pig.
  • Wrote pig UDF’s.
  • Developed HIVE queries for the analysts.
  • Automated all the jobs starting from pulling the Data from different Data Sources like MySQL and pushing the result dataset to Hadoop Distributed File System and running MR jobs and PIG/Hive using Oozie (Work Flow management).
  • Monitor System health and logs and respond accordingly to any warning or failure conditions.

Environment: Hadoop (CDH 4), HDFS, Map Reduce, Hive, Java, Pig, Sqoop, Oozie, REST Web Services, HBase, UNIX Shell Scripting.

Confidential, Boston, MA

Java Developer

Responsibilities:

  • Involved in SDLC Requirements gathering, Analysis, Design, Development and Testing of application developed using AGILE methodology.
  • Actively participated in Object Oriented Analysis Design sessions of the Project, which is based on MVC Architecture using Spring Framework.
  • Involved in Daily Scrum meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
  • Experience in writing PL/SQL Stored procedures, Functions, Triggers, Oracle reports and Complex SQL’s.
  • Designed and developed entire application implementing MVC Architecture.
  • Developed frontend of application using Bootstrap (Model, View, and Controller), Java Script, and Angular.js framework.
  • Used Spring framework for implementing IOC/JDBC/ORM, AOP and Spring Security.
  • Proactively found the issues and resolved them.
  • Established efficient communication between teams to resolving the issues.
  • Gave an innovative for logging for all interdepends application.
  • Successfully delivered all product deliverables that resulted with zero defects.

Environment: Java, Junit, MySQL, Spring, Struts, Web Services (SOAP, RESTFUL 4.0), Java Script.

We'd love your feedback!