We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

Nashville, TN

PROFESSIONAL SUMMARY:

  • Over 7+ years of overall IT experience in a variety of industries, which includes hands on experience in Big Data and Data warehouse ETL technologies.
  • Experience in Big Data processing using Hadoop and its ecosystem (MapReduce, Pig, Hive, Sqoop, Flume, Spark).
  • Experience in AWS EC2, configuring the servers for Auto scaling and Elastic load balancing.
  • Good working experience on Spark (spark streaming, spark SQL), Scala and Kafka.
  • Good working knowledge on Snowflake and Teradata databases.
  • Provided and constructed solutions for complex data issues.
  • Experience in development and design of various scalable systems using Hadoop technologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, MapReduce, Hive & PIG.
  • Experience in understanding the security requirements for Hadoop.
  • Excellent Programming skills at a higher level of abstraction using Scala and Python.
  • Hands on experience in developing SPARK applications using Spark tools like RDD transformations, Spark core, Spark Streaming and Spark SQL.
  • Strong experience and knowledge of real time data analytics using Spark Streaming.
  • Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Excellent working experience on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
  • Experienced in designing and developing applications in Spark using Scala to compare the performance of Spark with Hive and SQL/Oracle.
  • Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses.
  • Hands on experience working on NoSQL databases including HBase, MongoDB,Cassandra and its integration with Hadoop cluster.
  • Good experience in designing the data flow for the collapse of 4 legacy data warehouses into an AWS Data Lake.
  • Expertise in converting AWS existing infrastructure to server less architecture(AWS Lambda, Kinesis) and deployed via Terraform or AWS Cloud formation.
  • Good Knowledge with Snowflake on Multi - Cluster Warehouses.
  • Good amount of knowledge with Snowflake Virtual Warehouses where I had some sessions on it
  • Strong Knowledge and experience on implementing Big Data in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
  • Used NoSQL Database including HBase, MongoDB, Cassandra.
  • Having good experience on all flavors of Hadoop (Cloudera, Hortonworks, and MapR etc.)
  • Good knowledge in RDBMS concepts (Oracle 11g, MS SQL Server 2000) and strong SQL, PL/SQL query writing skills (by using TOAD & SQL Developer tools), Stored Procedures and Triggers.
  • Expertise in developing jobs using Spark framework modules like Spark-Core,
  • Spark-SQL and Spark Streaming using Java, Scala, Python. .
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs
  • Good experience in creating and designing data ingest pipelines using technologies such as Apache Storm- Kafka.
  • Experienced in working with in-memory processing framework like Spark Transformations, Spark SQL and Spark Streaming.
  • Good working experience on using Sqoop to import data into HDFS from RDBMS and vice-versa.
  • Experienced in implementing POC using Spark Sql libraries.
  • Improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, YARN.
  • Hands on experience in handling Hive tables using Spark SQL.
  • Efficient in writing MapReduce Programs and using Apache Hadoop API for analyzing the structured and unstructured data.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing partitioning and bucketing, writing, and optimizing the HiveQL queries.
  • Debugging Pig and Hive scripts and optimizing MapReduce job and debugging Map reduce job.
  • Hands-on experience in managing and reviewing Hadoop logs.
  • Good knowledge about YARN configuration.
  • Extending Hive and Pig core functionality by writing custom UDFs.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
  • Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie (hive, pig) and Dag (lambada).
  • Developed various shell scripts and python scripts to address various production issues.
  • Developed and designed automation framework using Python and Shell scripting
  • Good Knowledge of data compression formats like Snappy, Avro.
  • Experience with all stages of the SDLC and Agile Development model right from the requirement gathering to Deployment and production support.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

PROFESSIONAL EXPERIENCE:

Data Engineer

Confidential, Nashville, TN

Responsibilities:

  • Understand the requirements and prepared architecture document for the Big Data project.
  • Involved in creating the Pipeline using ZDP which includes all necessary capabilities such as data ingestion, Preprocessing, schema validation, Load data to Raw and Trusted zone. It also consists of exception path in case of failure it will send email notification to the distribution list.
  • We have used hive as the data base and hive queries using insert select statements loaded the data from Raw to Trusted
  • Involved in all the naming standards such as database naming, table naming and column naming standards
  • Hands on experience on Shell scripting and Python scripts
  • Good experience on the Ranger security where I have created Ranger policies which has groups to the users
  • Involved in masking and unmasking the data for the end users in Ranger
  • Created different S3 buckets as we store the data in 3 buckets Raw, Trusted and Landing
  • Hands on experience on the ZDP platform which is used for data processing
  • Involved in creating the views on top of the tables in Hive database
  • Good experience on creating the reports using Power BI and Tableau.
  • Involved in creating IAM roles, policies, groups-maintained cloud security using
  • Ranger MFA for internal users. Application uses LDAP
  • Imported and Exported Data from Different Relational Data Sources like DB2, SQL Server, Teradata to HDFS using Sqoop.
  • Migrated complex map reduce programs into in memory Spark processing using Transformations and actions.
  • Worked on creating the RDD's, DFs for the required input data and performed the data transformations using Spark Python.
  • Involved in developing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save the results to output directory into HDFS.
  • Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Developed PIG scripts for the analysis of semi structured data.
  • Developed PIG UDF'S for manipulating the data according to Business Requirements and worked on developing custom PIG Loaders.
  • Worked on Oozie workflow engine for job scheduling.
  • Developed Oozie workflow for scheduling and orchestrating the ETL process.
  • Experienced in managing and reviewing the Hadoop log files using Shell scripts.
  • Migrated ETL jobs to Pig scripts to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
  • Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
  • Worked with Avro Data Serialization system to work with JSON data formats.
  • Used AWS S3 to store large amount of data in identical/similar repository.
  • Involved in build applications using Maven and integrated with Continuous Integration servers like bamboo to build jobs.
  • Used Enterprise Data Warehouse database to store the information and to make it access all over organization.
  • Responsible for preparing technical specifications, analyzing functional Specs, development, and maintenance of code.
  • Worked with the Data Science team to gather requirements for various data mining projects
  • Written shell scripts for rolling day-to-day processes and it is automated.

Data Engineer

Confidential

Responsibilities:

  • Implemented and experienced in creating s3 buckets creation and IAM role la in non-prod region.
  • Implemented spark transformations and ingestion into our data lake.
  • Hands on expedited in implementation and designing EMR clusters in nonprofit and prod region based on data ingestion sizes
  • Hands on experience on implementing spark jobs
  • Design and implement control m jobs for completing ingestion process
  • Design and implemented oozie jobs to run the ingestion
  • Designed ingestion cluster to perform on data transformation
  • Configured spark jobs for quick ingestion and added enough resources to handled 10TB data on daily basis.
  • Responsible for Account management, IAM Management and Cost management.
  • Designed AWS Cloud Formation templates to create VPC, subnets, NAT to ensure successful deployment of Web applications and database templates.
  • Creating S3 buckets also managing policies for S3 buckets and Utilized S3 bucket and Glacier for storage and backup on AWS.
  • Experience to manage IAM users by creating new users, giving them a limited access as per needs, assign roles and policies to specific user.
  • Created RDD's in Spark technology.
  • Extracting data from data warehouse (Teradata) on to the Spark RDD's
  • Experience on Spark with Scala/Python.
  • Implemented build and deploy plans from scratch.
  • Hands on experience on bitbucket and bamboo.

Hadoop Developer

Confidential

Responsibilities:

  • Installed and configured Hadoop MapReduce, HDFS
  • Importing and exporting data into HDFS and Hive using Sqoop.
  • Experienced in defining job flows. Experienced in managing and reviewing Hadoop log files.
  • Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Experienced in running Hadoop streaming jobs to process terabytes of xml format data.
  • Load and transform large sets of structured, semi structured, and unstructured data.
  • Responsible to manage data coming from different sources.
  • Supported Map Reduce Programs those are running on the cluster.
  • Involved in loading data from UNIX file system to HDFS.
  • Installed and configured Hive and written Hive UDFs.
  • Involved in creating Hive tables, loading with data, and writing Hive queries which will run internally in map reduce way.
  • Conducted functional, system, data, and regression testing.
  • Involved in Bug Review meetings and participated in weekly meetings with management team.

Confidential

ETL Developer

Responsibilities:

  • Analyzed the source system and involved in designing the ETL data load.
  • Developed/designed Informatica mappings by translating the business requirements.
  • Worked in various transformations like Lookup, Joiner, Sorter, Aggregator, Router, Rank and Source Qualifier to create complex mapping.
  • Involved in performance tuning of the Informatica mappings using various components like Parameter Files, round robin and Key range partitioning to ensure source and target bottlenecks were removed.
  • Implemented documentation standards and practices to make mappings easier to maintain.
  • Extensive SQL querying for Data Analysis and wrote, executed, performance tuned SQL Queries for Data Analysis & Profiling. Extracted business rule and implemented business logic to extract and load SQL server using T-SQL.
  • Worked with Teradata utilities like FastLoad and MultiLoad
  • Involved in automating retail prepaid system process. Created packages and dependencies of the processes.
  • Identified common issues in Cognos and published in NJSI Wiki page. Established Dashboards and Business reports.
  • Created automating retail prepaid system process; created packages and dependencies of the processes.
  • Created support, maintaining, enhancing and developing Wiki page new interfaces for Claim warehouse application.
  • Used Autosys for scheduling various data cleansing scripts and loading processes; maintained the batch processes using UNIX Scripts.
  • Monitor & troubleshoot batches and sessions for weekly and monthly extracts from various data sources across all platforms to the target database.
  • Tuned the mappings by removing the Source/Target bottlenecks and Expressions to improve the throughput of the data loads.

Tools: and Environment: Informatica PowerCenter, Oracle 10g, PL/SQL, MS SQL Server, Cognos, Autosys, and Quality Center

Business Analyst and Data Analyst

Confidential

Responsibilities:

  • Liaised with business and functional owners during project scoping and planning to identify the high-level requirements for the project.
  • Documented the functional requirements and created the various other process flow diagrams and artifacts.
  • Performed Risk Analysis of the requirements to identify the project critical success factors and prioritize functional requirement.
  • Defined specifications like use case documentation, activity diagram, and business process flow using Microsoft Visio.
  • Participated in Joint Application Development (JAD) sessions for requirements gathering.
  • Created Required Data Elements (RDE) to define the business rules for source to target mapping.
  • Created Requirements Traceability Matrix (RTM) and tracked solution validation to requirements matrix.
  • Performed gap analysis of business rules, business and system process flows, user administration, and requirements. Worked with the functional owners to bridge the gaps.
  • Lead calls with the SMEs and business to finalize complex Business logics and Transformation rules.
  • Coordinated and managed the execution of User Acceptance Testing.
  • Involved in creation of OALP, Data analysis, Data processing - Extracting data from different sources, Cleansing and transferring data, Distribution of data, Mappings Creation, Debugging, Optimization, Slowly changing dimensions, Comparison of mappings, Designing the data mart.
  • Involved in the data profiling activities for the column assessment and natural key study.
  • Performed complex data analysis in support of ad-hoc, standard, and project related requests.
  • Identified and resolved data related issues.
  • Ensured Quality standards to meet the Data Quality SLAs.
  • Performed Data Mining to analyze the patterns of the data sets.
  • Provided support to the development and testing teams during the lifecycle of the project.
  • Responsible for framework modeling, report writing (relational and multidimensional) and creating dashboard components.
  • Analyzed emerging business trends, production patterns and forecasting demand using BI Reports
  • Involved in updating the existing Universe and Model when the reports require changes.

We'd love your feedback!