We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

4.00/5 (Submit Your Rating)

Pittsburg, PA

SUMMARY

  • 8+ years of IT experience in Software Development with 6+ years’ work experience as Big Data /Hadoop Developer with good knowledge of Hadoop framework.
  • Extensively used PySpark to build scalable data pipelines for reporting.
  • Expertise in Hadoop architecture and various components such as HDFS, YARN, High Availability, Job Tracker, Task Tracker, Name Node, Data Node, and MapReduce programming paradigm.
  • Experience with all aspects of development from initial implementation and requirement discovery, through release, enhancement and support (SDLC & Agile techniques).
  • Having 4+ years of experience in Design, Development, Data Migration, Testing, Support and Maintenance using Redshift Databases.
  • Good working knowledge on Snowflake and Teradata databases.
  • Having hands on experience on NoSql databases like HBASE,HIVE
  • Having hands on experience on streaming data using kafka & spark scala streaming API.
  • Worked on Dimensional Data modelling in Star and Snowflake schemas and Slowly Changing Dimensions (SCD).
  • Having 4+ years of experience on Apache Hadoop technologies like Hadoop distributed file system (HDFS), Map Reduce framework, Hive, PIG, Python, Sqoop, Oozie, HBase, Spark, Scala and Python.
  • Hands on experience in cloud technologies like Databricks and HDInsight, Azure storage systems, Azure data factory, ADLS gen2, Azure AD, Azure Monitoring system, Azure SQL database.
  • Very keen in knowing newer techno stack that Google Cloud platform (GCP) adds.
  • Having hands on experience in versioning using bitbucket
  • Working experience in Big data on cloud using AWS EC2 & Microsoft Azure, and handled redshift & Dynamo databases with huge amount of data 300 TB.
  • Extensive experience in migrating on premise Hadoop platforms to cloud solutions using AWS and Azure.
  • 3+ years of experience in writing python as ETL framework and Pyspark to process huge amount of data daily.
  • Strong experience in implementing data models and loading unstructured data using HBase, Dynamo Db and Cassandra.
  • Having hands on experience in Application Deployment using CICD pipeline.
  • Experience in implementing Spark using Scala and Spark SQL for faster processing of data.
  • Strong experience in extracting and loading data using complex business logic’s using Hive from different data sources and built teh ETL pipelines to process tera bytes of data daily.
  • Experienced in transporting, and processing real time event streaming using Kafka and Spark Streaming.
  • Hands on experience with importing and exporting data from Relational databases to HDFS, Hive and HBase using Sqoop.
  • Experienced in processing real time data using Kafka 0.10.1 producers and stream processors and implemented stream process using Kinesis and data landed into data lake S3.
  • Designed and developed spark pipelines to ingest real time event - based data from Kafka and other message queue systems and processed huge data with spark batch processing into data warehouse hive.
  • Experienced in creating and analyzing Software Requirement Specifications (SRS) and Functional Specification Document (FSD).
  • Working with Bootstrap twitter framework to Design single page application.
  • Strong working knowledge developing Cross Browser Compatibility (IE, Firefox, Safari, Chrome etc.) for dynamic web applications.
  • Experienced in developing teh unit test cases using Junit, Mockito, Scala Test.
  • Capable of organizing, coordinating and managing multiple tasks simultaneously.
  • Excellent communication and inter-personal skills, self-motivated, organized and detail-oriented, able to work well under deadlines in a changing environment and perform multiple tasks effectively and concurrently.
  • Strong analytical skills with ability to quickly understand client’s business needs. Involved in meetings to gather information and requirements from teh clients.

TECHNICAL SKILLS

Big data Technologies: HDFS, Map Reduce, Pig, Hive, Sqoop, Oozie, Spark, Kafka, Nifi, Airflow, Flume, Snowflake

Hadoop Frameworks: Cloudera CDHs, Hortonworks HDPs, MAPR

Language: Scala, Python, Java, sql

Methodologies: Agile, Waterfall

Build Tools: Maven, Gradle, Jenkins.

Databases: HBase, Cassandra, MongoDB, PostgreSQL, Mysql

BI Tools: Tableau

Operating System: Windows 7/8/10, Vista, UNIX, Linux, Ubuntu, Mac OS X

PROFESSIONAL EXPERIENCE

Confidential, Pittsburg, PA

Senior Data Engineer

Responsibilities:

  • Design, develop, implement, test, document, and operate large-scale, high-volume, high-performance bigdata structures for business intelligence analytics.
  • Parsed and transformed complex daily data feeds for multi-destination delivery Provide analytics support to leadership team by proactively asking questions and rigorously analyzing some of teh most important issues impacting teh future of teh Consumer business
  • Provide on-line reporting and analysis using business intelligence tools and a logical abstraction layer against large, multi-dimensional datasets and multiple sources Migrated teh enterprise data from multiple systems in various formats.
  • Design and implement ETL configuration, work with complex huge datasets, enhance query performance, migrating jobs to HDFS to improve scaling and performance are some of my other job responsibilities
  • Developed a data Quality framework to automate teh quality checks at different layer of data transformation.
  • Build a frame to extract teh data file from mainframe system and performance ETL using shell script and Pyspark.
  • Developed deployment architecture and scripts for automated system deployment in Jenkins.
  • Involved in data migration project from end to end. Migrated Data from on prem cloudera cluster to cloud (Microsoft Azure)
  • During Migration upgraded teh code with new spark and python versions.
  • Performed data ingestion from multiple RDBMS, vendor files using Qlik and stored in ADLS gen2 (blob containers).
  • Used AutoLoader concept to load data which are received from from vendor files. Used File notification method to Implement it.
  • Performed teh Data cleansing and transformations using Azure Databricks. Used Delta tables for SCD type tables. Used Azure monitoring for logging and alerts
  • Implemented complete job dependency and scheduling using Azure Databricks.
  • After teh transformation data is stored in data marts and used for reporting layer. Used Snowflake to store teh transformed data which is consumed by data scientists, reporting layer etc.

Confidential -NYC, NY

Hadoop/Data Engineer

Responsibilities:

  • Written ETL jobs in using spark data pipelines to process data from different source to transform data to multiple targets.
  • Created streams using Spark and processed real time data into RDDs & data frames and created analytics using SPARK SQL.
  • Creating Test Automation Framework in Python.
  • Created control structures
  • Designed Redshift based data delivery layer for business intelligence tools to operate directly on AWS S3.
  • Implement one time Data Migration of Multistate level data from SQL server to Snowflake by using Python and SnowSQL.
  • Build data pipelines in airflow in GCP for ETL related jobs using different airflow operators.
  • Implemented kinesis data streams to read real time data and loaded into data s3 for downstream processing.
  • Created framework for Data Profiling
  • Created framework for data encryption
  • Expertise in snowflake to create and Maintain Tables and views.
  • AWS Infrastructure setup on EC2 and S3 API implementation for accessing S3 bucket data file.
  • Designed “Data Services” to intermediate data exchange between teh Data Clearinghouse and teh Data Hubs.
  • Written ETL flows and MapReduce to process data from AWS S3 to dynamo DB and HBase.
  • Involved in teh ETL phase of teh project & Designed and analyzed teh data in oracle and migrated to Redshift and Hive.
  • Experience in moving data between GCP and Azure using Azure Data Factory.
  • Use Spark to process live Streaming data using Apache Kafka.
  • Created ETL framework to hydrate teh data lake using pyspark
  • Developed Spark using Scala and Spark SQL for faster processing of data.
  • Developed Spark streaming jobs to process terabytes of XML format data using Scala.
  • Created Databases and tables using Redshift and dynamo DB and written complex EMR scripts to process Tera bytes of data into AWS S3 cluster.
  • Performed real time analytics on transactional data using python to create statistical model for predictive and reverse product analysis.
  • Proficient in Machine Learning algorithms and Predictive Modeling including Linear Regression, Logistic Regression, Naive Bayes, Decision Tree, Random Forest, Gradient Boosting, SVM, KNN, K-mean clustering.
  • Develop Scala Source Code to process heavy RAW JSON data.
  • Involving in client meetings and explaining teh views to supporting and gathering requirements.
  • Working in an agile methodology, understand teh requirements of teh user stories
  • Prepared High-level design documentation for approval.
  • Created ETl Framework using azure databricks
  • Also, data visualization software tableau, quick sight and Kibana are used as part of bringing new insights from data extracted and better representation of data.
  • Created ETL pipeline using azure datafactory.
  • Designed data models for dynamic and real-time data with intention to be used by various applications with OLAP and OLTP needs

Environment: Pyspark, Scala, Python, AWS, tableau, shell-scripting, Apache Kafka, JIRA, azure datafactory, azure databricks.

Confidential, Indianapolis, IN

Big Data Developer

Responsibilities:

  • Spearheading Big Data Project from end-to-end
  • Created data lake in hadoop by extracting data from different sources
  • Implemented Feature Engineering on data for preparing data for ML algorithms
  • Incremental loading of data into hdfs using SQOOP
  • Tuning of Hive queries
  • Created shell scripts for Modules Automation
  • Assigned task to team using Jira; Tracked Task progress across teh team
  • Created TDD(Technical Design Document)
  • Spearheaded Big Data Project from end-to-end
  • Creating spark jobs in Python for ETL & Analysing data
  • Loaded of XML,JSON, CSV.Parquet files using Pyspark jobs & Spark-scala
  • Loaded complex XML files using jaxb libraries
  • Analysed real time data using spark streaming
  • Created unix shell scripts to call teh spark jobs
  • Experience in importing and exporting data from different RDBMS like MySql, Oracle and SQL Server into HDFS and Hive using Sqoop.
  • Monitored all Map Reduce Read Jobs running on teh cluster using Cloudera Manager and ensured that they were able to read teh data to HDFS without any issues.
  • Debugging teh Name Node logs and Resource Manger logs if teh cluster is down and jobs are failing.
  • Experience in Rebalance an HDFS Cluster.
  • Experience in writing teh shell scripting to perform an Audit on teh hive tables.
  • Extracting teh data from teh HIVE tables for data analysis.
  • Developed a Map Reduce code for data cleaning and transformation.

Confidential

Software Engineer

Responsibilities:

  • Utilized Oracle Designer 6i to perform data modelling
  • Documented Tech Specs for teh proposed database design
  • Devised PL/SQL packages and procedures for teh back end processing of teh proposed database design
  • Delivered PL/SQL training session for co-workers to educate about teh latest PL/SQL features, PL/SQL performance tuning
  • Facilitated management of database
  • Designed tables, synonyms, sequences, views, PL/SQL stored procedures and triggers
  • Performed testing and code review
  • Conducted performance tuning of teh overall system by eliminating redundant joins, creating indexes, removing redundant code. Developed UNIX shell scripts to perform a nightly refresh of teh test system from Production databases. Monitored user profiles, roles and privileges for teh Sybase database
  • Maintaining technical, functional documentation and all teh deliverables.

Environment: Oracle, Forms Developer, Report Developer, HP UX 11i, UNIX SUN Solaris 5.8, Oracle 9i, Putty, WINSCP, Pl/SQL, Unix scripting

We'd love your feedback!