We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

3.00/5 (Submit Your Rating)

Ashburn, VA

SUMMARY:

  • Above 7+ years of experience as Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Strong knowledge of Software Development Life Cycle (SDLC) and expertise in detailed design documentation.
  • Experience in Agile Methodology, participated in Sprints and daily Scrums to deliver software tasks on-time and with good quality on basis with onsite and offshore teams.
  • Experience in development of Big Data projects using Hadoop, Hive, Pig, Flume and MapReduce open source tools/technologies.
  • Responsible for setup, configuration and migration of Excel based system to Azure SQL database.
  • Experience in writing complex Pig scripts, Hive & Impala queries, Implementation, import data to Hadoop using Sqoop & vice versa.
  • Good working with big data on AWS cloud services - EC2, S3, EMR, DynamoDB and Redshift.
  • Experience with Client-Server application development using Oracle PL/SQL, SQL PLUS, SQL Developer and TOAD
  • Good experience in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Excellent experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Solid knowledge of Data Marts, OLAP, OLTP, Dimensional Data Modeling with Ralph Kimball Methodology using Analysis Services.
  • Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, and MDM.
  • Knowledge and experience in job work-flow scheduling and monitoring tools like Oozie and Zookeeper.
  • Knowledge in configuring and managing - Cloudera's Hadoop platform along with CDH3&4 clusters.
  • Extensive experience on HBase High availability and manually tested using failover tests.
  • Proficient experience in importing and exporting data between HDFS and RDBMS using Sqoop.
  • Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
  • Good experience with real time stream processing frameworks such as Kafka, Apache Strom, Apache Nifi
  • Addressing complex POCs according to business requirements from the technical end.
  • Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
  • Good experience in working with different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS) and Business Objects.
  • Very good understanding of Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
  • Pleasant experience working on analysis tool like Tableau for regression analysis, pie charts, and bar graphs.
  • Experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
  • Experience in working with different data sources like Flat files, XML files and Databases.
  • Strong experience with architecting highly per formant databases using MySQL and Cassandra.
  • Experience in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive.
  • Working on the ad-hoc queries, Indexing, Replication, Load balancing, Aggregation in MongoDB Ability to communicate and work effectively with associates at all levels within the organization.
  • Strong background in mathematics and have very good analytical and problem solving skills.

TECHNICAL SKILLS:

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Big Data Tools: Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Cloud Management: Amazon Web Services(AWS), Microsoft Azure

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2019, DB2.

Operating System: Windows, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

WORK EXPERIENCE:

Confidential - Ashburn, VA

Sr. Data Engineer

Responsibilities:

  • As a Sr. Data Engineer, I worked on different NoSQL databases for information extraction and place huge amount of data.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper and Sqoop.
  • Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
  • Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into hive tables.
  • Involved in design, development and testing phases of Software Development Life Cycle (SDLC).
  • Utilized Agile Scrum Methodology to help manage and organize a team with regular code review sessions.
  • Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
  • Used Data Replication Tool Migrated on premise RDBMS's to Azure Data Warehouse.
  • Developed MapReduce programs and custom UDFs in hive and pig for various data transformations.
  • Prepared spark scripts to replace existing pig scripts to process data in spark framework.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Developed PL/SQL programs (Functions, Procedures, Packages and Triggers).
  • Created the packages in SSIS (ETL) with the help of Control Flow Containers, Tasks and Data Flow Transformations.
  • Implemented Security in Web Applications using Azure and deployed Web Applications to Azure.
  • Worked on different file formats like Sequence files, XML files and graph files using Map Reduce
  • Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL.
  • Involved in create Hive tables, loading with data and writing Hive queries which will run internally
  • Involved in creating POCs to ingest and process data using Apache Nifi and Stream sets.
  • Rendered and delivered reports in desired formats by using reporting tools such as Tableau.
  • Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
  • Performed the performance and tuning at source, Target and Data Stage job levels using Indexes, Hints and Partitioning.
  • Involved in working of big data analysis using Pig and User defined functions (UDF)
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Implemented Kafka High level consumers to get data from Kafka partitions and move into HDFS
  • Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
  • Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Involved in reports development using reporting tools like Tableau and used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
  • Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.

Environment: Hadoop 3.0, Agile, Oracle 12c, Apache Hive 2.3, HDFS, Azure, SQL, PL/SQL, Apache Pig 0.17, ETL, Sqoop 1.4, HBase 1.2, Kafka 1.0, Tableau 10.5, Apache Flume 1.8, MapReduce.

Confidential, NY

Data Engineer

Responsibilities:

  • Worked as a Data Engineer designed and Modified Database tables and used HBase Queries to insert and fetch data from tables.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Involved in Agile development methodology active member in scrum meetings.
  • Worked using Apache Big Data Ecosystem components like HDFS, Hive, Sqoop, Pig, and Map Reduce.
  • Worked with MySQL for identifying required tables and views to export into HDFS.
  • Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
  • Executed the Hive jobs to parse the logs and structure them in relational format to provide effective queries on the log data.
  • Worked on Data modeling, Advanced SQL with Columnar Databases using AWS.
  • Reverse Engineered the existing Stored Procedures and wrote Mapping Documents for them.
  • Used various features of Oracle like Collections, Associative arrays, Bulk processing methods to write effective code.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS).
  • Involved in the design of the new Data Mart for Finance Department and working with Erwin Data Modeling Tool.
  • Developed Pig Scripts for capturing data change and record processing between new data and already existed data in HDFS.
  • Populated HDFS and Cassandra with huge amounts of data using Apache Kafka.
  • Star schema was developed for proposed central model and normalized star schema to snowflake schema.
  • Developed Pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Worked with AWS to implement the client-side encryption as DynamoDB does not support at rest encryption at this time.
  • Used Flume to handle streaming data and loaded the data into Hadoop cluster
  • Developed operational data store to design data marts and enterprise data warehouses.
  • Developed Data mapping, Transformation and Cleansing rules for the Master Data Management involving OLTP, and OLAP.
  • Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.
  • Worked Platform using Hive, Sqoop, HBASE .This effort showcased the benefits of the Horton works
  • Developed optimal strategies for importing and exporting the stored web log data into HDFS and Hive using Scoop.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Imported millions of structured data from relational databases using Sqoop import to process using stored the data into HDFS in CSV format.
  • Developed the Apache Storm, Kafka, and HDFS integration project to do a real-time data analyses.

Environment: Hadoop 3.0, Oracle 12c, HDFS, AWS, Agile, MDM, HBase 1.2, Apache Hive 2.3, Apache Pig 0.17, Sqoop 1.4,, OLTP, OLAP, Hortonworks, SQL, PL/SQL, Kafka 1.0.

Confidential - Indianapolis, IN

Data Analyst/Data Engineer

Responsibilities:

  • Worked As a Data Analyst/Data Engineer I was responsible for all data related aspects of a project.
  • Analyzed the Business information requirements and examined the OLAP source systems to identify the measures, dimensions and facts required for the reports.
  • Involved with the Business Analysts' team in requirements gathering and based on provided business requirements, defined detailed Technical specification documents.
  • Worked on migrating SQL Server databases, data warehouse & reporting to AWS using Redshift, DynamoDB, & Tableau
  • Wrote Hive queries for data analysis to meet the business requirements.
  • Wrote SQL Queries using Joins, Sub Queries and correlated sub Queries to retrieve data from the database.
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Developed MapReduce jobs for cleaning, accessing and validating the data and Installed Oozie workflow engine to run multiple Hive and Pig jobs.
  • Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
  • Created SSIS packages for data Importing, Cleansing, and Parsing etc. Extracted, cleaned and validated
  • Involved in modifying various existing packages, Procedures, functions, triggers according to the new business needs.
  • Worked with Sqoop in Importing and exporting data from different databases like MySQL, Oracle into HDFS and Hive.
  • Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
  • Wrote Scripts to generate Map Reduce jobs and performed ETL procedures on the data in HDFS.
  • Extensively performed Data Profiling, Data Cleansing, De-duplicating the data and has a good knowledge on best practices.
  • Connected to various sources of data via tableau to validate and build dashboards and created story lines and made presentations on the findings.
  • Involved in data validations of the results in Tableau by validating the numbers against the data in the database
  • Wrote Pig Latin scripts and also developed UDFs for Pig Data Analysis.
  • Developed and deployed quality T-SQL codes, stored procedures, views, functions, triggers and jobs.
  • Performed analysis and presented results using SQL, SSIS, Excel, and Visual Basic scripts.
  • Designed and Populated specific tables, databases for collection, tracking and reporting of data.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Oracle 11g, AWS, Amazon Redshift, SSIS, HDFS, T-SQL, SQL, Tableau, Apache Hive 1.8, Sqoop, MapReduce.

Confidential

Data Analyst/Data Modeler

Responsibilities:

  • Conducted sessions with the Business Analysts to gather the requirements.
  • Analyzed and build proof of concepts to convert SAS reports into tableau or use SAS dataset in Tableau.
  • Worked as a team to conduct Gap Analysis and identify data anomalies and determined the scope for these anomalies and suggested options for conversion and cleanup of data.
  • Created and executed SQL scripts to validate, verify and compare the source data to target table data.
  • Created and loaded temporary staging tables for Data validation and to enhance performance.
  • Performed data analysis and data profiling using complex SQL on various sources systems including Oracle.
  • Involved in identifying the Data requirements and creating Data Dictionary for the functionalities
  • Created Stored Procedures, Triggers, view, Tables and other SQL Joins and Statements for Applications by using T-SQL
  • Created stored procedures using PL/SQL and tuned the databases and backend process.
  • Performed Data mining and hardcopy or electronic document study to improve and expand the databases in the application.
  • Worked with data investigation, discovery and mapping tools to scan every single data record from many sources.
  • Performed detailed requirement analysis and created data mapping documents
  • Used various Transformations in SSIS Dataflow, Control Flow using for loop Containers
  • Acted as liaison between Business Intelligence and Business User groups to relay change requests.
  • Used predictive analytics such as machine learning and data mining techniques to predict patients,
  • Worked on Data Verifications and Validations to evaluate the data generated according to the requirements is appropriate and consistent.
  • Written SQL Scripts and PL/SQL Scripts to extract data from Database and for Testing Purposes.
  • Implemented python scripts to parse XML documents and load the data in the databases.
  • Implemented partition techniques to improve the performance and increase optimum utilization of space
  • Wrote multiple SQL queries to analyze the data and presented the results using Excel, Access, and Crystal Reports.
  • Involved in designing Parameterized Reports for generating ad-hoc reports as per the business requirements

Environment: Oracle9i, SQL, PL/SQL, XML, MS Excel 2012, MS Access, Crystal Reports, SAS.

We'd love your feedback!