Sr Data Engineer Resume Pataskala, Ohio - Hire IT People

SUMMARY

7 years of strong experience in Building end to end pipelines using Pyspark, Python, AWS services & in depth understanding of Distributed Systems Architecture and Parallel Processing Frameworks.
Experience in writing complex SQL queries, creating reports and dashboards.
Proficient in using Unix based Command Line Interface, Expertise in handling ETL tools like Informatica.
Designed and setup Enterprise Data Lake to provide support for various uses cases including Analytics, processing, storing, and reporting of voluminous, rapidly changing data.
Responsible for maintaining quality data in source by performing operations such as cleaning, transformation and ensuring Integrity in a relational environment by working closely with teh stakeholders & solution architect.
Designed and developed Security Framework to provide fine grained access to objects in AWS S3 using AWS Lambda, DynamoDB.
Set up and worked on Kerberos authentication TEMPprincipals to establish secure network communication on cluster and testing of HDFS, Hive, Pig and MapReduce to access cluster for new users.
Performed end - to-end Architecture & implementation assessment of various AWS services like Amazon EMR, Redshift, S3.
Implemented teh machine learning algorithms using python to predict teh quantity a user might want to order for a specific item so we can automatically suggest using kinesis firehose and S3 data lake.
Strong experience using pyspark, HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, and HBase.
Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new features
Experience in developing Spark Applications using Spark RDD, Spark-SQL, and Data frame APIs
Worked with real-time data processing and streaming techniques using Spark streaming and Kafka
Experience in moving data into and out of teh HDFS and Relational Database Systems (RDBMS) using Apache Sqoop
Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing, and tuning teh HQL queries
Significant experience writing custom UDFs in Hive and custom Input Formats in MapReduce
Involved in creating Hive tables, loading with data, and writing Hive ad-hoc queries dat will run internally in MapReduce and TEZ, replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing, Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
Strong understanding of real time streaming technologies Spark and Kafka
Knowledge of job workflow management and coordinating tools like Oozie
Strong experience building end to end data pipelines on Hadoop platform
Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase
Strong understanding of Logical and Physical database models and entity-relationship modeling
Experience with Software development tools such as JIRA, Play, GIT
Good understanding of teh Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables
Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data
Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD)
Excellent analytical, communication and interpersonal skills

TECHNICAL SKILLS

Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS

Hadoop Ecosystem: Hadoop, MapReduce, Yarn, HDFS, Pig, Oozie, Zookeeper

Big Data Ecosystem: Spark, Spark SQL, Spark Streaming, Hive, Impala, Hue

Data Ingestion: Sqoop, Flume, NiFi, Kafka

NOSQL Databases: HBase, Cassandra, MongoDB

Programming Languages: C, Scala, Core Java, J2EE (SERVLETS, JSP, JDBC, JAVA BEANS, EJB) Frameworks MVC, Struts, Spring, Hibernate

Web Technologies: HTML, CSS, XML, JavaScript, Maven

Scripting Languages: Java Script, UNIX, Python, R Language

Databases: Oracle 11g, MS-Access, MySQL, SQL-Server 2000/2005/2008/2012 , Teradata

SQL Server Tools: SQL Server Management Studio, Enterprise Manager, Query Analyzer, Profiler, Export & Import (DTS).

IDE: Eclipse, Visual Studio, IDLE, IntelliJ

Web Services: Restful, SOAP

Tools: Bugzilla, Quick Test Pro (QTP) 9.2, Selenium, Quality Center, Test Link, TWS, SPSS, SAS, Documentum, Tableau, Mahout

Methodologies: Agile, UML, Design Patterns

PROFESSIONAL EXPERIENCE:

Sr Data Engineer

Confidential, Pataskala, Ohio

Responsibilities:

Develop Shell script dat reads Json files and apply it to Sqoop and Hive.
Ingested data from Relational DB, Oracle database, Postgres SQL using SQOOP into HDFS and loaded them into Hive tables, AWS S3, transformed and analyzed large datasets by running Hive queries and using Apache Spark.
Work with pyspark to migrate fixed width, ORC, csv etc. files.
Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata.
Utilized SQOOP, ETL and Hadoop File System API's for implementing data ingestion pipelines.
Worked on Batch data of different granularity ranging from hourly, daily to weekly and monthly.
Handled Hadoop cluster installations in various environments such as Unix, Linux, and Windows
Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Ambari, spark and Hive.
Work with stream sets and develop pipelines using stream sets
Developing and writing SQLs and stored procedures in Teradata. Loading data into snowflake and writing Snow SQLs scripts
TDCH scripts for full and incremental refresh of Hadoop tables.
Optimizing Hive queries by parallelizing with portioning and bucketing.
Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC.
Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs
Designed and published visually rich and intuitive Stream sets pipelines to migrate data
Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications
Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager
Used Agile Scrum methodology/ Scrum Alliance for development

Environment: Hadoop, HDFS, AWS, Vertica, Scala, Kafka, MapReduce, YARN, Spark, Hive, Scala, MySQL, Kerberos, Maven, Stream sets.

Sr. Hadoop/Big Data Engineer

Confidential, Tampa, FL

Responsibilities:

Setting up Datalake in google cloud using Google cloud storage, Big Query, and Big Table.
Creating shell scripts to process teh raw data, loading data to AWS S3, and Redshift databases
Planning and design of data warehouse in STAR schema. Designing structure of tables and documenting it.
Designed and implemented end to end big data platform on Teradata Appliance
Performing ETL from multiple sources such as Kafka, NIFI, Teradata, DB2 Using Hadoop spark.
Involvement developing architecture solution of teh project to migrate data.
Developed Python, Bash scripts to automate and provide Control flow.
Moving data from Teradata to a Hadoop cluster Using TDCH/Fast export and Apache NIFI.
Work with Pyspark to perform ETL and generate reports.
Writing regression SQL to merge teh validated Data into Prod environment.
Develop Python, PySpark, Bash scripts logs to Transform, and Load data across on premise and cloud platform.
Write UDFs in Hadoop Pyspark to perform transformations and loads.
Use NIFI to load data into HDFS as ORC files.
Writing TDCH scripts and apache NIFI to load data from Mainframes DB2 to Hadoop cluster.
Working with Google cloud storage. Research and development of strategies to minimize teh cost in google cloud.
Using Apache solar for search operations on data.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
Working with multiple sources. Migrating tables from Teradata and DB2 to Hadoop cluster.
Source Analysis, tracing back teh sources of teh data and finding its roots though Teradata, DB2 etc.
Identifying teh jobs dat load teh source tables and documenting it.
Being an active part of Agile Scrum process with Sprints of 2 weeks.
Working with Jira, Microsoft planner to track teh progress of teh project.

Big Data Engineer

Confidential, Greenwood Village, CO

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop
Used Pyspark Spark-Streaming APIs to perform necessary transformations and actions on teh fly for building teh common learner data model which gets teh data from Kafka in near real time and persists into Cassandra
Experience in Loading teh data into Pyspark data frames and Spark RDDs, perform advanced procedures like text analytics and processing using in memory data Computation capabilities of pyspark Spark using Scala to generate teh Output response
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in PySpark, Effective & efficient Joins, Transformations and other during ingestion process itself
Developed Scala scripts using both Data frames/SQL and RDD/MapReduce in PySpark for Data Aggregation, queries and writing data back into OLTP system through SQOOP
Worked with Impala KUDU for creating a spark to IMPALA-KUDU data ingestion tool.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning
Optimizing of existing algorithms in Hadoop using PySpark Session, Spark-SQL, Data Frames and Pair RDDs
Used Data tax Spark-Cassandra connector to load data into Cassandra and used CQL to analyze data from Cassandra tables for quick searching, sorting, and grouping
Designed, developed, and did maintenance of data integration programs in a Hadoop and RDBMS environment with both traditional and non-traditional source systems
Experience in writing SQOOP Scripts for importing and exporting data from RDBMS to HDFS
Ingested data from RDBMS and performed data transformations, and then export teh transformed data to Cassandra for data access and analysis
Created Hive tables for loading and analyzing data, Implemented Partitions, Buckets, and developed Hive queries to process teh data and generate teh data cubes for visualizing
Implemented schema extraction for Parquet and Avro file Formats in Hive
Developed Hive scripts in Hive QL to de-normalize and aggregate teh data
Used Spark API over Cloudera Hadoop Yarn to perform analytics on data in Hive
Worked on a POC to compare processing time of Impala with Apache Hive for batch applications to implement teh former in project
Worked on migrating Map Reduce programs into Spark transformations using Spark and Scala
Experience in Job management using Fair scheduler and Developed job processing scripts using Oozie workflow
Experience in NoSQL Column-Oriented Databases like Cassandra and its Integration with Hadoop cluster
Worked with BI team to create various kinds of reports using Tableau based on teh client's needs
Experience in Querying on Parquet files by loading them into Spark data frames by using Zeppelin notebook
Experience in troubleshooting any problems dat arises during any batch data processing jobs
Extracted teh data from Teradata into HDFS/Dashboards using Spark Streaming
Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, experienced in Maintaining teh Hadoop cluster on AWS EMR

Environment: Hadoop Yarn, Spark-Core, Spark-Streaming, Spark-SQL, Scala, Python, Kafka, Hive, Sqoop, Amazon AWS, Elastic Search, Impala, Cassandra, Tableau, Talend, Cloudera, MySQL, Linux, Shell scripting

Hadoop Developer

Confidential, New York, NY

Responsibilities:

Developed Hive, Bash scripts for source data validation and transformation. Automated data loading into HDFS and Hive for pre-processing teh data using One Automation.
Gather data from Data warehouses in Teradata and Snowflake
Developed Spark/Scala, Python for regular expression project in teh Hadoop/Hive environment.
Designed and implemented an ETL framework to load data from multiple sources into Hive and from Hive into Teradata.
Generate reports using Tableau.
Experience at building Big Data applications using Cassandra and Hadoop
Utilized SQOOP, ETL and Hadoop File System APIs for implementing data ingestion pipelines
Worked on Batch data of different granularity ranging from hourly, daily to weekly and monthly.
Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager
Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows
Assisted in upgrading, configuration, and maintenance of various Hadoop infrastructures like Ambari, PIG, and Hive.
Developing and writing SQLs and stored procedures in Teradata. Loading data into snowflake and writing Snow SQLs scripts
TDCH scripts for full and incremental refresh of Hadoop tables.
Optimizing Hive queries by parallelizing with partitioning and bucketing.
Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and ORC.
Worked extensively on Teradata, Hadoop-Hive, Spark, SQLs, PLSQLs, Snow SQLs
Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making
Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications
Experienced in working with Hadoop from Horton works Data Platform and running services through Cloudera manager
Used Agile Scrum methodology/ Scrum Alliance for development

Environment: Hadoop, HDFS, AWS, Vertica, Bash, Scala, Kafka, MapReduce, YARN, Drill, Spark, Pig, Hive, Scala, Python, Java, NiFi, HBase, MySQL, Kerberos, Maven, Shell Scripting, SQL.

We provide IT Staff Augmentation Services!

Sr Data Engineer Resume

Pataskala, OhiO

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship