Sr Data Engineer Resume Omaha, NE - Hire IT People

SUMMARY

7+ years of overall IT experience with experience in Big Data technologies which includes designing and implementing data pipelines, AWS Cloud, Java, Spark (using Python/Scala).
Expertise in Hadoop, HDFS, Map Reduce and Hadoop Ecosystem including Hive, HBase, HBase - Hive Integration, Spark-Core, Spark-SQL, Kafka, Sqoop, Oozie & MapReduce Framework.
Good understanding on Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Application Master, Resource Manager, Node Manager and MapReduce programming paradigm.
Experience in AWS services like EMR, S3, Cloud Formation stacks, Glue, Redshift, Athena, Aurora RDS, Cloud watch, SNS, Lambda and Step Functions.
Good exposure and experience in Spark, Scala, Big Data and AWS Stack.
Used different Spark modules like Spark core, Spark SQL, Spark Streaming, Spark Data sets and Data frames.
Used Spark Data Frames Operations to perform required validations on the data and to perform analytics on the Hive data.
Worked on Spark Streaming and Structured Spark streaming using Apache Kafka for real time data processing.
Strong experience troubleshooting long running spark applications, designing highly fault tolerant spark applications and fine-tuning spark applications.
Responsible for developing multiple Kafka Producers and Consumers from scratch as per the software requirement specifications.
Extract Real time feed using Kafka and Spark Streaming and convert it to DF and process data in the form of Data Frame and save the data as JSON/ORC formats in HDFS.
Have experience in Shell Scripting and used it extensively to automate deployment and configuration management tasks.
Developing various cross platform products while working with different Hadoop file formats like ORC, Avro, Parquet, Json, Delimited files.
Analyzing Data through Hive QL, Pig Latin & MapReduce programs in Java.
Extending HIVE core functionalities by implementing custom UDF’s.
Performed ad-hoc queries on structured data using Hive QL and used Partition, bucketing techniques and joins with Hive for faster data access.
Importing and exporting data into HDFS and HIVE using Sqoop.
Good hands on experience in creating the RDD's, Data frames for the required input data and performed the data transformations using Spark and Scala.
Hands on experience on Hortonworks, Cloudera Hadoop environments.
Experienced in working with NoSQL databases like Cassandra and HBase.
Proficient SQL experience in querying, data extraction/transformations and developing queries for a wide range of applications.
Experienced in using waterfall, Agile and Scrum models of software development process.
Strong knowledge of version control systems like Bit Bucket and GITHUB.
Good level of experience in Core Java, JEE technologies, JDBC, Servlets and JSP.
Active team player with excellent interpersonal skills, keen learner with self-commitment & innovation. Ability to meet deadlines and handle pressure in coordinating multiple tasks in the work environment.

TECHNICAL SKILLS

Hadoop/Big Data Technologies: HDFS, Hive, HBase, Sqoop, Yarn, Spark, Spark SQL, Kafka

Hadoop Distributions: Cloudera and AWS EMR

Languages: Java, Python, Scala

Reporting: Tableau

Operating Systems: Linux, Unix and Windows

Databases: Teradata, Oracle DB2, SQL Server, MySQL

Build Tools: Maven, Ant, Jenkins

Version Control: Git, SVN, CVS

PROFESSIONAL EXPERIENCE

Sr Data Engineer

Confidential, Omaha, NE

Responsibilities:

Migrate legacy IBM data stage ETL pipelines to containerized python applications.
Design and implementing Scalable PySpark data processing framework for incoming customer information.
Implementing data quality over incoming data as per business requirement in PySpark.
Communication with client and requirement gathering for change request.
Deployed, schedule and execute applications in Cybermation.

Environment: Unix, Docker container, Jenkins, Cybermation, IBM data stage, DB2.

Sr Data Engineer

Confidential, Detroit, MI

Responsibilities:

Worked on building centralized Data Lake on AWS Cloud utilizing primary services like S3, EMR, Redshift and Athena.
Worked on migrating datasets and ETL workloads from On-prem to AWS Cloud services.
Built series of PySpark Applications using python and Hive scripts to produce various analytical datasets needed for digital marketing teams.
Worked extensively on building and automating data ingestion pipelines and moving terabytes of data from existing data warehouses to cloud.
Worked extensively on fine tuning spark applications and providing production support to various pipelines running in production.
Worked closely with business teams and data science teams and ensured all the requirements are translated accurately into our data pipelines.
Worked on full spectrum of data engineering pipelines: data ingestion, data transformations and data analysis/consumption.
Working with Azure data bricks, Azure data factory, Pyspark and other relevant technologies in Azure Microsoft Azure cloud.
Worked on automating the Infrastructure setup, launching and termination EMR clusters etc.,
Created Hive external tables on top of datasets loaded in S3 buckets and created various hive scripts to produce series of aggregated datasets for downstream analysis.
Build real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift.
Worked on creating Kafka producers using Kafka Java Producer API for connecting to external Rest live stream application and producing messages to Kafka topic.
Used Talend for Data Integration
Developed microservice on boarding tools leveraging Python and Jenkins allowing for easy creation and maintenance of build jobs and Kubernetes deploy and services.

Environment: AWS S3, EMR, Redshift, Athena, Glue, Spark, Python, Java, Hive, Kafka, IAM Roles

Sr BigData Engineer

Confidential, SFO, CA

Responsibilities:

Built custom Input adapters for ingesting gigabytes of behavioural event logs from external servers such as FTP server and S3 buckets on daily basis.
Created Sqoop scripts to import/export user profile and other lookup data from RDBMS to S3 data store.
Developed various spark applications using python(pyspark) to perform cleansing, transformation and enrichment of these click stream data.
Involved in data cleansing, event enrichment, data aggregation, de-normalization and data preparation needed for machine learning and reporting.
Utilized Spark RDD, Dataframes and Spark Sql API to implement batch processing of jobs.
Troubleshooting Spark applications for improved error tolerance and reliability.
Fine-tuning spark applications/jobs to improve the efficiency and overall processing time for the pipelines.
Created Kafka producer API to send live-stream json data into various Kafka topics.
Developed Spark-Streaming applications to consume the data from Kafka topics and to insert the processed streams to Snowflake.
Utilized Spark in Memory capabilities, to handle large datasets.
Used Broadcast variables in PySpark, effective & efficient Joins, transformations and other capabilities for data processing.
Created and updated existing jobs by using autosys
Experienced in working with EMR cluster and S3 in AWS cloud.
Creating Hive tables, loading and analyzing data using hive scripts.
Implemented Partitioning (both dynamic Partitions and Static Partitions) and Bucketing in HIVE.
Involved in continuous Integration of application using Jenkins.
Interacted with the infrastructure, network, database, application and BA teams to ensure data quality and availability
UsedKubernetesto deploy scale, load balance, scale and manageDockercontainers

Environment: Spark, Kafka, Hive, Java, S3, EMR, Redshift, Athena, Glue, Scala, Kafka

BigData Engineer

Confidential, NYC, NY

Responsibilities:

Using Sqoop to import and export data from Oracle and PostgreSQL into HDFS to use it for the analysis
Migrated Existing MapReduce programs to Spark Models using Python.
Migrating the data from Data Lake (Hive) into S3 Bucket.
Done data validation between data present in data lake and S3 bucket.
Used Spark Data Frame API over Cloudera platform to perform analytics on Hive data.
Designed batch processing jobs using Apache Spark to increase speeds by ten-fold compared to that of MR jobs.
Used Kafka for real time data ingestion.
Created different topic for reading the data in Kafka.
Read data from different topics in Kafka.
Moved data from s3 bucket to snowflake data warehouse for generating the reports.
Written Hive queries for data analysis to meet the business requirements.
Migrated an existing on-premises application to AWS.
Developed PIG Latin scripts to extract the data from the web server output files and to load into HDFS
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
Created many Sparks UDF and UDAFs in Hive for functions that were not preexisting in Hive and Spark SQL.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Implementing different performance optimization techniques such as using distributed cache for small datasets, partitioning, and bucketing in Hive, doing map side joins etc.
Good knowledge on Spark platform parameters like memory, cores, and executors.

Environment: Apache Hadoop Framework, HDFS, YARN, HIVE, HBASE, AWS (S3, EMR), Scala, Spark, SQOOP

Java Developer

Confidential

Responsibilities:

Involved in Requirements Analysis and design an Object-oriented domain model.
Involvement in the detailed Documentation, written functional specifications of the module.
Involved in development of Application with Java and J2EE technologies.
Develop and maintain elaborate services-based architecture utilizing open-source technologies like
Hibernate, ORM and Spring Framework.
Developed server-side services using Java multithreading, Struts MVC, Java, EJB, Spring, Webservices (SOAP, WSDL, AXIS).
Responsible for developing DAO layer using Spring MVC and configuration XMLs for Hibernate and to also manage CRUD operations (insert, update, and delete).
Designing, Development and Implementation of JSPs in Presentation layer for Submission, Application,
Reference implementation.
Development of JavaScript for client end data entry validations and Front-End Validation.
Deployed Web, presentation, and business components on Apache Tomcat Application Server.
Developed PL/SQL procedures for different use case scenarios
Involvement in post-production support, Testing and used JUNIT for unit testing of the module.

Environment: Java/J2EE, JSP, XML, Spring Framework, Hibernate, Eclipse (IDE), Java Script, Ant, SQL, PL/SQL, Oracle, Windows, UNIX, Soap, Jasper reports.

We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

Omaha, NE

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship