We provide IT Staff Augmentation Services!

Data Engineer Resume

5.00/5 (Submit Your Rating)

Minneapolis, MN

SUMMARY

  • Around 7 years of experience in Datamining with large datasets of Structured and Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
  • 2+ Experience in Amazon Web services and Google Cloud platform.
  • Experienced in using R Programming, SAS, Python, Tableau and Power BI for data cleaning, data visualization, risk analysis, and predictive analytics.
  • Experience in development and design of various scalable systems using Hadoop technologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, MapReduce, Spark, Hive & PIG.
  • Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, SQS.
  • Strong hands - on experience with AWS services, including but not limited to AWS Sage Maker, EMR, S3, EC2, route53, RDS, ELB, DynamoDB, CloudFormation, etc.
  • Implementations. Mainly focusing in working on setting up clusters, data Extraction, Transformation and Loading.
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
  • Experience in major Hadoop ecosystem components such as PIG, HIVE, HBASE, SQOOP, KAFKA and monitoring them with Cloudera Manager and Ambari.
  • Hands on experience working on NoSQL databases including HBase and its integration with Hadoop cluster.
  • Good working experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
  • Have been involved in full life-cycle projects using Object Oriented methodologies / Programming (OOP’s).
  • Experience in working with Oracle using SQL, PL/SQL.
  • Experience in developing Hive Query Language scripts for data analytics.
  • Strong Knowledge and hands-on experience of SDLC methodologies and Business Process Models (BPM) like Agile Modeling and Waterfall model.
  • Extensive experience in building reports and dashboards in Microsoft Power BI by using Azure data warehouse and Microsoft Azure Analysis Services, involved in performance tuning of reports, and resolving issues within database and cubes.
  • Knowledge in retrieving data from different databases like Oracle, SQL, My SQL, MS Access, DB2, and Teradata
  • Expertise and Vast knowledge of Enterprise Data Warehousing including Data Modeling, Data Architecture, Data Integration (ETL/ELT) and Business Intelligence.
  • Extensive experience in development and designing of ETL methodology for supporting data transformations and processing in a corporate-wide environment using Teradata, Mainframes, and UNIX Shell Scripting.
  • Experienced in Dimensional Data Modeling experience using Data modeling, Relational Data modeling, ER/Studio, Erwin, and Sybase Power Designer, Star Join Schema/Snowflake modeling, FACT & Dimensions tables, Conceptual, Physical & logical data modeling.
  • Good experience in Production Support, identifying root causes, Troubleshooting and Submitting Change Controls.
  • Experienced in handling all the domain and technical interaction with application users, analyzing client business processes, documenting business requirements.
  • Experience and actively involved in Requirement gathering, Analysis, Design, Reviews, Coding and Code Reviews, Unit and Integration Testing.
  • Working Knowledge in J2EE technologies like Servlets, JSP, Struts, Hibernate, EJB and JDBC.
  • Knowledge in Web Services and SOA Architecture.
  • Designed and developed Microservices business components using Spring Boot.
  • Have strong analytical skills with proficiency in debugging, problem solving.
  • Experienced with web/application servers as IBM Web Sphere 5.1/6.0
  • Good knowledge of stored procedures, functions, etc. using SQL and PL/SQL.
  • Expertise in using Version Control systems like GIT.
  • Familiar with CI/CD tools like Jenkins, Ansible, Chef and Puppet.
  • Knowledge on Amazon Web Services and Microsoft Azure.
  • Strong verbal and communication skills.
  • Experience working within an agile development process.
  • Worked with big teams, and always liked to be a TEAM (Together Everyone Achieves More) player.

TECHNICAL SKILLS

Hadoop / Big Data Technologies: HDFS, MapReduce, Spark, Hive, Pig, Sqoop, Flume, Kafka, NiFi, Oozie.

Programming Languages: Python, Java JDK1.4/1.5/1.6 (JDK 5/JDK 6), HTML, SQL, PL/SQL.

Web Services: SOAP, Apache, REST

Frameworks: Spring, Hibernate, Struts, EJB, JMS, JSF

Java/J2EE Technology: Servlets, JSP, Web Services, jQuery, JDBC, SOAP, REST, JMS, AJAX, XML.

Operating Systems: UNIX, Windows, LINUX

Databases: Oracle 8i/9i/10g, Microsoft SQL Server, DB2 & MySQL 4.x/5.x

NoSQL Databases: HBase, Cassandra

Java IDE: Eclipse 3.x, IBM Web Sphere Application Developer, IBM RAD 7.0

Tools: SQL Developer, SOAP UI, ANT, Maven, Gradle.

PROFESSIONAL EXPERIENCE

Confidential, Minneapolis, MN

Data Engineer

Responsibilities:

  • Involved in Big data requirement analysis, develop and design solutions for ETL and Business Intelligence platforms.
  • Demonstrable expertise in core IT processes, utilizing ETL tools to query, validate, and analyze data.
  • Experience in development and design of various scalable systems using Hadoop technologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, MapReduce, Spark, Hive & PIG.
  • Acted as SME for Data Warehouse related processes.
  • Performed Data analysis for building Reporting Data Mart.
  • Worked with Reporting developers to oversee the implementation of report/universe designs.
  • Tuned performance of Informatica mappings and sessions for improving the process and making it efficient after eliminating bottlenecks.
  • Created AWS infrastructure for Salesforce syncs to/from Redshift.
  • Worked on AWS services such as AWS DynamoDB, AWS Lambda, AWS EMR, AWS IAM, S3 instances but not limited to this.
  • Designed and deployed data pipelines using AWS services such as EMR, AWS DynamoDB, Lambda, Glue, EC2, S3, RDS, EBS, Elastic load Balancer (ELB), Auto-scaling groups.
  • Migrated an existing on-premises application toAWS. UsedAWSservices likeDynamoDB, EC2andS3for small data sets processing and storage,Experiencedin Maintaining the Hadoop cluster onAWS EMR.
  • Used SparkSQL to load data and created schema RDD on top of that which loads into hive tables and handled structured using SparkSQL
  • Worked on AWS EC2 Instances creation, setting up AWS VPC, launching AWS EC2 instances different kind of private and public subnets based on the requirements for each of the applications.
  • Involved in converting the hql’s in to spark transformations using spark RDD with support of python and Scala
  • Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows & Mapping configurations.
  • Worked on migration project which included migrating web methods code to Informatica cloud.
  • Installed Kafka on Hadoop cluster and configured producer and consumer in java to establish connection from source to HDFS with popular hashtags.
  • Load real time data from various data sources into HDFS using Kafka.
  • Worked on reading multiple data formats on HDFS using python.
  • Implemented Spark using Python (PySpark) and SparkSQL for faster testing and processing of data.
  • Load the data into Spark RDD and do in memory data Computation.
  • Involved in converting Hive/SQL queries into Spark transformations using API’s like Spark SQL, Data Frames and python.
  • Analyzed the SQL scripts and designed the solution to implement using python.
  • Exploring the Spark by improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, SparkSQL, Data Frame, Pair RDD & Spark YARN.
  • Performed transformations, cleaning and filtering on imported data using Spark Data frame API, Hive, MapReduce, and loaded final data into Hive.
  • Involved in converting Map Reduce programs into Spark transformations using Spark RDD on python.
  • Developed Spark scripts by using python Shell commands as per the requirement.
  • Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
  • Design and develop the HBase target schema.
  • Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
  • Worked on visualizing the reports using Tableau.

Environment: Apache Spark, HDFS, Java, Map Reduce, Hive, HBase, Sqoop, SQL, Knox, Oozie, Cloudera Manager, ZooKeeper, Cloudera.

Confidential, Charlotte, NC

Data Engineer

Responsibilities:

  • Responsible for a setup of 5 node development clusters for a Proof of Concept which was later implemented as a fulltime project by Fortune Brands.
  • Responsible for Installation and configuration of Hive, Sqoop, Zookeeper, Knox and Oozie on the Hortonworks Hadoop cluster using Ambari.
  • Involved in extracting large sets of structured, semi structured, and unstructured data.
  • Developed Sqoop scripts to import data from Oracle database and handled incremental loading on the point-of-sale tables.
  • Created Hive external tables and views, on the data imported into the HDFS.
  • Developed and implemented Hive scripts for transformations such as evaluation, filtering, and aggregation.
  • Creating pipelines, data flows and complex data transformations and manipulations using Azure Data Factory (ADF) and PySpark with Databricks.
  • Implemented both ETL and ELT architectures in Azure using Data Factory, Databricks, SQL DB and SQL Data warehouse.
  • Created reusable pipelines in Data Factory to extract, transform and load data into Azure SQL DB and SQL Data warehouse.
  • Worked on partitioning Hive tables and running the scripts in parallel to reduce run-time of the scripts.
  • Build various pipeline to integrate the Azure Cloud to AWS S3 to get the data into Azure Database.
  • Created External tables in Azure SQL Database for data visualization and reporting purpose.
  • Worked on most critical Finance projects and had been the go-to person for any data related issues for team members.
  • Migrated ETL code from Talend to Informatica. Involved in development, testing and postproduction for the entire migration project.
  • Developed User Defined Functions (UDF) in Java if required for hive queries.
  • Worked with data in multiple file formats including Avro, Parquet, Sequence files, ORC and Text/ CSV.
  • Used Oozie Operational Services for batch processing and scheduling workflows dynamically.
  • Worked on creating End-End data pipeline orchestration using Oozie.
  • Developed bash scripts to automate the above process of Extraction, Transformation and Loading.
  • Implemented Authentication and Authorization using Kerberos, Knox and Apache Ranger.
  • Very good experience in managing the Hadoop cluster using Ambari.
  • Responsible for creating a business process and workflow documentation using BPMN standards.
  • Design the Test Architecture and the Scenarios for the Automation
  • Business data modeling and analysis and a thorough understanding of relational data structures, database star and snowflake schemas and OLAP concepts
  • Created roles and user groups in Ambari for permitted access to Ambari functions.
  • Working knowledge of MapReduce and YARN architectures.
  • Working knowledge on ZooKeeper.
  • Working knowledge on Tableau.

Environment: Apache Hadoop, HDFS, Java, Map Reduce, Hive, PIG, Sqoop, SQL, Knox, Oozie, Ambari, Ranger, ZooKeeper, Hortonworks (HDP).

Confidential

Data Analyst

Responsibilities:

  • Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
  • Extensively worked on Informatica PowerCenter.
  • Identified sources and performed data profiling on the identified sources.
  • Prepared specification documents (BRD / FRD) based on business rules given by Business.
  • Prepared Business Process and Data Process Models using MS Visio.
  • Involved in Data Migration between Teradata, MS SQL Server, DB2, and Oracle
  • Involved in data load/export utilities like BTEQ, Fast Load, Multi-Load, Fast Export, and UNIX/Mainframe environment
  • Extensive experience in Stored Procedures, Triggers, Functions, Cursors, Views, Materialize Views and Analytical Functions.
  • Extracted data from the database using SAS/Access, SAS SQL procedures and create SAS data sets.
  • Created Teradata SQL scripts using OLAP functions like RANK () to improve the query performance while pulling the data from large tables.
  • Worked on MongoDB database concepts such as locking, transactions, indexes, Sharding, replication, schema design, etc.
  • Parsed complex files through Informatica Data Transformations and loaded it to Database.
  • Optimized query performance by oracle hints, forcing indexes, working with constraint-based loading and few other approaches.
  • Designed and developed weekly, monthly reports using MS Excel Techniques (Charts, Graphs, Pivot tables) and PowerPoint presentations.
  • Strong Excel skills, including pivots, VLOOKUP, conditional formatting, large record sets. Including data manipulation and cleaning.
  • Extensively worked on UNIX Shell Scripting for splitting groups of files to various small files and file transfer automation.
  • Worked with Autosys scheduler for scheduling different processes.
  • Performed basic and unit testing.
  • Assisted in UAT Testing and provided necessary reports to the business users.

Environment: Informatica Power Center 8.6, Oracle 10g/11g, UNIX Shell Scripting, Autosys

Confidential

ETL/Data Warehouse Developer

Responsibilities:

  • Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
  • Extensively worked on Informatica PowerCenter.
  • Parsed complex files through Informatica Data Transformations and loaded it to Database.
  • Optimized query performance by oracle hints, forcing indexes, working with constraint-based loading and few other approaches.
  • Extensively worked on UNIX Shell Scripting for splitting groups of files to various small files and file transfer automation.
  • Worked with Autosys scheduler for scheduling different processes.
  • Performed basic and unit testing.
  • Assisted in UAT Testing and provided necessary reports to the business users.

Environment: Informatica Power Center 8.6, Oracle 10g/11g, UNIX Shell Scripting, Autosys

We'd love your feedback!