We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

5.00/5 (Submit Your Rating)

Omaha, NE

SUMMARY

  • 7+ years of overall IT experience with experience in Big Data technologies which includes designing and implementing data pipelines, AWS Cloud, Java, Spark (using Python/Scala).
  • Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive Integration, Spark-Core, Spark-SQL, Kafka, Sqoop, Oozie & MapReduce Framework.
  • Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
  • Experience in AWS services like EMR, S3, Cloud Formation stacks, Glue, Redshift, Athena, Aurora RDS, Cloud watch, SNS, Lambda and Step Functions.
  • Good exposure and experience in Spark, Scala, Big Data and AWS Stack.
  • Used different Spark modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
  • Used Spark Data Frames Operations to perform required validations on the data and to perform analytics on the Hive data.
  • Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
  • Strong experience troubleshooting long running spark applications, designing highly fault tolerant spark applications and fine-tuning spark applications.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to DF and process data in the form of Data Frame and save the data as JSON/ORC formats in HDFS.
  • Have experience in Shell Scripting and used it extensively to automate deployment and configuration management tasks.
  • Developing various cross platform products while working with different Hadoop file formats like ORC, Avro, Parquet, Json, Delimited files.
  • Analyzing Data through Hive QL, Pig Latin & MapReduce programs in Java.
  • Extending HIVE core functionalities by implementing custom UDF’s.
  • Performed ad-hoc queries on structured data using Hive QL and used Partition, bucketing techniques and joins with Hive for faster data access.
  • Importing and exporting data into HDFS and HIVE using Sqoop.
  • Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark and Scala.
  • Hands on experience on Hortonworks, Cloudera Hadoop environments.
  • Experienced in working with NoSQL databases like Cassandra and HBase.
  • Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range of applications.
  • Experienced in using waterfall, Agile and Scrum models of software development process.
  • Strong knowledge of version control systems like Bit Bucket and GITHUB.
  • Good level of experience in Core Java, JEE technologies, JDBC, Servlets and JSP.
  • Active team player with excellent interpersonal skills, keen learner with self-commitment & innovation. Ability to meet deadlines and handle pressure in coordinating multiple tasks in the work environment.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Hive, HBase, Sqoop, Yarn, Spark, Spark SQL, Kafka

Hadoop Distributions: Cloudera and AWS EMR

Languages: Java, Python, Scala

Reporting: Tableau

Operating Systems: Linux, Unix and Windows

Databases: Teradata, Oracle DB2, SQL Server, MySQL

Build Tools: Maven, Ant, Jenkins

Version Control: Git, SVN, CVS

PROFESSIONAL EXPERIENCE

Sr Data Engineer

Confidential, Omaha, NE

Responsibilities:

  • Migrate legacy IBM data stage ETL pipelines to containerized python applications.
  • Design and implementing Scalable PySpark data processing framework for incoming customer information.
  • Implementing data quality over incoming data as per business requirement in PySpark.
  • Communication with client and requirement gathering for change request.
  • Deployed, schedule and execute applications in Cybermation.

Environment: Unix, Docker container, Jenkins, Cybermation, IBM data stage, DB2.

Sr Data Engineer

Confidential, Detroit, MI

Responsibilities:

  • Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.
  • Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.
  • Built series of PySpark Applications using python and Hive scripts to produce various analytical datasets needed for digital marketing teams.
  • Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
  • Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
  • Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
  • Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
  • Working with Azure data bricks, Azure data factory, Pyspark and other relevant technologies in Azure Microsoft Azure cloud.
  • Worked on automating the Infrastructure setup, launching and termination EMR clusters etc.,
  • Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis.
  • Build real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
  • Worked on creating Kafka producers using Kafka Java Producer API for connecting to external Rest live stream application and producing messages to Kafka topic.
  • Used Talend for Data Integration
  • Developed microservice on boarding tools leveraging Python and Jenkins allowing for easy creation and maintenance of build jobs and Kubernetes deploy and services.

Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Python, Java, Hive, Kafka, IAM Roles

Sr BigData Engineer

Confidential, SFO, CA

Responsibilities:

  • Built custom Input adapters for ingesting gigabytes of behavioural event logs from external servers such as FTP server and S3 buckets on daily basis.
  • Created Sqoop scripts to import/export user profile and other lookup data from RDBMS to S3 data store.
  • Developed various spark applications using python(pyspark) to perform cleansing, transformation and enrichment of these click stream data.
  • Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.
  • Utilized Spark RDD, Dataframes and Spark Sql API to implement batch processing of jobs.
  • Troubleshooting Spark applications for improved error tolerance and reliability.
  • Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines.
  • Created Kafka producer API to send live-stream json data into various Kafka topics.
  • Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to Snowflake.
  • Utilized Spark in Memory capabilities, to handle large datasets.
  • Used Broadcast variables in PySpark, effective & efficient Joins, transformations and other capabilities for data processing.
  • Created and updated existing jobs by using autosys
  • Experienced in working with EMR cluster and S3 in AWS cloud.
  • Creating Hive tables, loading and analyzing data using hive scripts.
  • Implemented Partitioning (both dynamic Partitions and Static Partitions) and Bucketing in HIVE.
  • Involved in continuous Integration of application using Jenkins.
  • Interacted with the infrastructure, network, database, application and BA teams to ensure data quality and availability
  • UsedKubernetesto deploy scale, load balance, scale and manageDockercontainers

Environment: Spark, Kafka, Hive, Java, S3, EMR, Redshift, Athena, Glue, Scala, Kafka

BigData Engineer

Confidential, NYC, NY

Responsibilities:

  • Using Sqoop to import and export data from Oracle and PostgreSQL into HDFS to use it for the analysis
  • Migrated Existing MapReduce programs to Spark Models using Python.
  • Migrating the data from Data Lake (Hive) into S3 Bucket.
  • Done data validation between data present in data lake and S3 bucket.
  • Used Spark Data Frame API over Cloudera platform to perform analytics on Hive data.
  • Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
  • Used Kafka for real time data ingestion.
  • Created different topic for reading the data in Kafka.
  • Read data from different topics in Kafka.
  • Moved data from s3 bucket to snowflake data warehouse for generating the reports.
  • Written Hive queries for data analysis to meet the business requirements.
  • Migrated an existing on-premises application to AWS.
  • Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
  • Created many Sparks UDF and UDAFs in Hive for functions that were not preexisting in Hive and Spark SQL.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
  • Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in Hive, doing map side joins etc.
  • Good knowledge on Spark platform parameters like memory, cores, and executors.

Environment: Apache Hadoop Framework, HDFS, YARN, HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP

Java Developer

Confidential

Responsibilities:

  • Involved in Requirements Analysis and design an Object-oriented domain model.
  • Involvement in the detailed Documentation, written functional specifications of the module.
  • Involved in development of Application with Java and J2EE technologies.
  • Develop and maintain elaborate services-based architecture utilizing open-source technologies like
  • Hibernate, ORM and Spring Framework.
  • Developed server-side services using Java multithreading, Struts MVC, Java, EJB, Spring, Webservices (SOAP, WSDL, AXIS).
  • Responsible for developing DAO layer using Spring MVC and configuration XMLs for Hibernate and to also manage CRUD operations (insert, update, and delete).
  • Designing, Development and Implementation of JSPs in Presentation layer for Submission, Application,
  • Reference implementation.
  • Development of JavaScript for client end data entry validations and Front-End Validation.
  • Deployed Web, presentation, and business components on Apache Tomcat Application Server.
  • Developed PL/SQL procedures for different use case scenarios
  • Involvement in post-production support, Testing and used JUNIT for unit testing of the module.

Environment: Java/J2EE, JSP, XML, Spring Framework, Hibernate, Eclipse (IDE), Java Script, Ant, SQL, PL/SQL, Oracle, Windows, UNIX, Soap, Jasper reports.

We'd love your feedback!