We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Memphis, TN

SUMMARY

  • Sr. Data Engineer with Around 8+ years of experience in the design and development of analytics/big data applications using leading industry tools, working with fortune firms.
  • Well - rounded experience in ETL, Hadoop, Spark, data modeling, data visualization
  • Good understanding of Big Data concepts like Hadoop, Map - Reduce, YARN, Spark, RDD, Data frames, Datasets, Streaming.
  • Responsible for designing and developing T - SQL queries, ETL packages, and business reports using SQL Server Management Studio(SSMS), MS BI Suite and Tableau
  • Experience on Migrating SQL database to Azure data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks and Azure SQL Data warehouse and Controlling and granting database access and Migrating On premise databases to Azure Data lake store using Azure Data factory.
  • Adept in statistical programming languages like etland R, Apache Spark, MATLAB including Big Data technologies like Hadoop, Hive, Pig, BigQuery,
  • Deep understanding & exposure of Big Data Eco - system.
  • Experienced in writing Pig Latin scripts, MapReduce jobs and HiveQL.
  • Expertise in creating HDInsight cluster and Storage Account with End-to-End environment for running the jobs.
  • Experience in working withCloudera,Hortonworks, and Microsoft AzureHDINSIGHTDistributions.
  • Experience in using various packages in Rand python like caret, dplyr, Rweka, Gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, Rpy2, Partykit
  • 5+ years of experiences on BI Application design on MicroStrategy, Tableau and Power BI.
  • Proficient in Hive, Oracle, SQL Server, SQL, PL/SQL, T-SQL and in managing very large databases
  • Hands on programming experience in scripting languages like JAVA, SCALA.
  • Experience writing in house UNIX shell scripts for Hadoop & Big Data Development
  • Skilled in performance tuning of data pipelines, distributed datasets, databases and SQL query performance.
  • Working with various machine learning models including Binary Classification Model, Multiclass Classification Model, Regression Model along with in developing business centric AI/ML solutions.
  • Work closely with development teams to ensure accurate integration of machine learning models into firm platforms
  • Develop API services in an Agile environment
  • Hands on experience on AWS cloud services (VPC, EC2, S3, RDS, Redshift, Data Pipeline, EMR, DynamoDB, Workspaces, Lambda, Kinesis, RDS, SNS, SQS)
  • Experience in managing multi-tenant Cassandra clusters on public cloud environment - Amazon Web Services (AWS)-EC2.
  • Experience in execution of Batch jobs through the data streams to SPARK Streaming.
  • Expert in importing and exporting data from different Relational Database Systems like MySQL and Oracle into HDFS and Hive using Sqoop.
  • Expertise in Extending Hive and Pig core functionality by writing custom UDF’s.
  • Expert in working with Hive data warehouse tool-creating tables, data distribution by implementing Partitioning and Bucketing, writing and optimizing the HiveQL queries.
  • Experience with databases like DB2, MySQL, SQL, MongoDB.
  • Developing various cross platform products while working with different Hadoop file formats like Sequence File, RC File, ORC, AVRO & Parquet.
  • Developed Reusable solutions to maintain proper coding standards across different java projects. Very good in Application Development and Maintenance of SDLC projects using different programming languages such as Java, C, Scala, SQL and NoSQL.
  • Experience in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, cursors, Index, triggers and packages.
  • Have good knowledge on NoSQL databases like HBase, Cassandra and MongoDB.
  • Extensively used ETL methodology for performing Data Migration, Extraction, Transformation and loading using Talend and designed data conversions from wide variety of source systems. Experienced in developing and designing Web Services (SOAP and Restful Web services).
  • Highly Proficient in writing complex SQL Queries, stored procedures, triggers and very well experienced in PL/SQL or T-SQL.
  • Experienced in developing Web Interface using Servlets, JSP and Custom Tag Libraries.
  • Absolute knowledge of software development life cycle (SDLC), database design, RDBMS, data warehouse.
  • Experience in data migration form legacy systems to Salesforce
  • Experience in writing Complex SQL Queries involving multiple tables inner and outer joins.
  • Expertise in various Java/J2EE technologies like JSP, Servlets, Hibernate, Struts, spring.
  • Good knowledge with web-based UI development using jQuery UI, jQuery, ExtJS, CSS3, HTML, HTML5, XHTML and JavaScript.

TECHNICAL SKILLS

Big Data: Sqoop, Flume, Hive, Spark, Pig, Kafka, Talend, HBase, Impala

ETL Tools: Informatica, Talend, Microsoft SSIS, Confidential DataStage. DBT

Database: Oracle, SQL Server 2016, Teradata, Netezza, MS Access, Snow

Reporting: MicroStrategy, Microsoft Power BI, Tableau, SSRS, Business Objects (Crystal)

Business Intelligence: MDM, Metadata, Data Cleansing, OLAP, OLTP, SCD, SOA, REST, Web Services.

Tools: Ambari, SQL Developer, TOAD, Erwin, H20.ai, Visio, Teradata

Operating Systems: Windows Server, UNIX (Red Hat, Linux, Solaris, AIX)

Languages: UNIX shell scripting, SCALA, SQL, PL/SQL, T-SQL, Python, R

PROFESSIONAL EXPERIENCE

Confidential, MEMPHIS, TN

Data Engineer

Responsibilities:

  • Work with Project Manager, Business Leaders and Technical teams to finalize requirements and create solution design & architecture.
  • Architect the data lake by cataloging the source data, analyzing entity relationships, and aligning the design as per performance, schedule & reporting requirements
  • Architect BI applications as Enterprise Solution for Supply Chain, Online, Finances.
  • Design and Develop Hadoop ETL solutions to move data to the data lake using big data tools like Sqoop, Hive, Spark, HDFS, Talend etc.
  • Integrated and extracted source data using SSIS ETL tool and stored procedures created in SQL Server Management Studio (SSMS), and developed transformation logic and designed ETL packages.
  • Played a lead role in the development of Confidential Data Lake and in building Confidential Data Cube on Microsoft AzureHDINSIGHTcluster.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks.
  • Design and Develop Spark code using Scala programming language & Spark SQL for high-speed data processing to meet critical business requirements
  • Worked with Informatica cloud for creating source and target objects, developed source to target mappings.
  • Experience with Informatica BDE related work on HDFS, Hive, Oozie, Spark and sqoop.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift.
  • Implement RDD/Datasets/Data frame transformations in Scala through Spark Context and Hive Context
  • Import python libraries into the transformation logic to implement core functionality
  • Wrote Spark-SQL and embedded the SQL in SCALA files to generate jar files for submission onto the Hadoop cluster
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the Sql Activity.
  • Developed build and Deployment Scripts using ANT and MAVEN as build tools in Jenkins to move from one environment to other environments.
  • Working with various industry leading tools such as Amazon Kinesis Stream, Amazon Kinesis Firehose, Kafka, for collecting data from IoT, geospatial, devices, social media, sensor data and storing it into AWS S3 for effective Big Data processing, Business Intelligence and Machine Learning.
  • Involved in Migrating Objects from Teradata to Snowflake.
  • Building the pipelines to copy the data from source to destination in Azure Data Factory (ADF V1)
  • Worked on creating dependencies of activities in Azure Data factory
  • Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts and design the solution and develop the program for data ingestion using - Sqoop, map reduce, Shell script & python.
  • Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources like S3, ORC/Parquet/Text Files into AWS Redshift and responsible for creating on-demand tables on S3 files using Lambda Functions and AWS Glue using Python and PySpark.
  • Responsible for migrating tables from traditional RDBMS into Hive tables using Sqoop and later generating required visualizations and dashboards using Tableau.
  • Worked on documentation of all Extract, Transform and Load, designed, developed, validated and deployed the Talend ETL processes for the Data warehouse team using PIG, HIVE.
  • Developed Spark code using Scala and Spark-SQL for faster testing and processing of data. Implemented scripts for loading data from UNIX file system to HDFS and involved in migrating Hive queries into Spark transformations using Data frames, Spark SQL, SQL Context, and Scala.
  • Managed mission-critical Hadoop cluster and Kafka at production scale, especially Cloudera distribution.
  • Enhanced customized processes on mainframe for data extractions as flat files from Legacy Database i.e., Migration of Customer Data
  • Development of data pipeline for ingesting data asynchronously by multiple producers into kafka pipeline.
  • Implemented Spark using Scala and also used Pyspark using Python for faster testing and processing of data and developed Python code to gather the data from HBase and designs the solution to implement using PySpark.
  • Updated maps, sessions and workflows as a part of ETL change and also modified existing ETL Code and documented the changes and developed complex SQL scripts for Teradata database for creating BI layer on DW for Tableau reporting.

Environment: Hortonworks 2.3.5, Sqoop, Azure, SQL Server Management Studio (SSMS), DBT, Hive, Informatica, HDINSIGHTcluster, AWS, Spark, Scala, Python, T-SQL, PL/SQL, Talend, DataStage, MicroStrategy 2019(Developer), UNIX, H2O.ai, Ambari, Oozie.

Confidential, New Orlins

Data Engineer/Data Architect

Responsibilities:

  • Performed extensive data analysis and coordinated with the client teams to develop data models
  • Worked as BI SME for converting business requirements into Technical requirements and document.
  • Creating Data Factories in Azure Data factory
  • Developed HQL scripts in Hive & Spark SQL to perform transformation on relational data and Sqoop export data back to DB’s.
  • Excellent experience with Label Machine Learning training, easily integrate human labelers and producing accurate results tools and services.
  • Prototype CI/CD system with GitLab utilizing Kubernetes and docker for the runtime environment for the CI/CD systems to build and test and deploy.
  • Worked on integrating python with Web development tools and Web Services
  • Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using PySpark.
  • Develop Unix Shell scripts to perform ELT operations on big data using functions like Sqoop, create external/internal Hive tables, initiate HQL scripts and BigQuery.
  • Developed the ETL/SQL code to load data from raw stages relational DB’s, and Ingest data using Sqoop to Hadoop environment
  • Developed and executed a migration strategy to move Data Warehouse from an Oracle platform to AWS Redshift.
  • Optimize Spark code in Scala through reengineering the DAG logic to use minimal resources and provide high throughput
  • Used Informatica file watch events to pole the FTP sites for the external mainframe files.
  • Develop PIG scripts to transform unstructured and semi structured streaming data and perform transformations.
  • Wrote scripts in Python for extracting data from HTML file.
  • Developed data flow architecture & physical data model with Data Warehouse Architect
  • Wrote unit scripts to automate data load and performed data transformation operations
  • Performance tuned the Hive code through use of Map Joins, partitions, vectorization, compute statistic.
  • Proficient in developing Web Services (SOAP, RESTful) in Python using XML, JSON.
  • Performance tuned the Spark code by minimizing shuffle operations, caching and persisting reusable RDD’s and adjusting the number of executors/cores/tasks

Environment: Informatica, DataStage, Hadoop, Azure, SQL Server Management Studio (SSMS), Shell Scripting, AWS, Scala, Sqoop, Hive, Oracle, MicroStrategy, Tableau, PL/SQL, Java, UNIX

Confidential, MOLINE, IL

Data Engineer

Responsibilities:

  • Extracted and profiled data from the customer, commercial loans and retail source systems that would provide the data needed for the loan reporting requirements
  • Determined criteria and wrote scripts for technical and business data quality checks, error handling and rejected reports during the data quality stage
  • Provided inputs on design of physical and logical architecture, Source\Target Mappings of the data warehouse and the ETL process
  • Created UNIX shell scripts to run the Informatica workflows and controlling the ETL flow
  • Created Hive tables and loaded data from HDFS to Hive tables as per the requirement.
  • Processing complex XML, XSLT files and generating derived fields to be loaded to database.
  • Converting Large XML files to Multiple XML files as required by downstream application.
  • Loading the Processed XML files to the Database tables.
  • Mapping source files and generating Target files in multiple formats like XML, Excel, CSV etc.
  • Transform the data and reports retrieved from various sources and generating derived fields.
  • Writing complex SQL queries to validate the reports.
  • Writing user defined function to transform data into required formats.
  • Developing Talend jobs by using the context variables and scheduling the jobs to run it automatically.
  • Extensively worked on Data Mapper to map complex JSON formats to XML.
  • Copy data to AWS S3 for storage and use COPY command to transfer data to Redshift. Used Talend connectors integrated to Redshift.
  • BI development for multiple technical projects running in parallel
  • Participate in development and implementation of product roadmap
  • Create technical blueprints and solution architecture diagrams
  • Troubleshoot and resolve incidents

Environment: Talend, Hadoop, Hortonworks, DataStage, AWS, Redshift, UNIX, Hive, Informatica, Control-M

Confidential, Waltham, MA

Data analyst/ BI developer

Responsibilities:

  • Extracted data from five operational databases containing almost two terabytes of data, loaded into the data warehouse and subsequently populated seven data marts
  • Created complex transformations, mappings, mapplets, reusable items, scheduled workflows based on the business logic and rules.
  • Used T-SQL in SQL Server Management Studio (SSMS) to develop Stored Procedures, User-Defined Functions (UDFs) and Views
  • Developed ETL job workflows with QC reporting and analysis frameworks
  • Developed Informatica mappings, Lookups, Reusable Components, Sessions, Work Flows etc. (on ETL side) as per the design documents/communication
  • Designed Metadata tables at source staging table to profile data and perform impact analysis
  • Performed query tuning and setting optimization on the Oracle database (rule and cost based)
  • Created Cardinalities, Contexts, Joins and Aliases for resolving loops and checked the data integrity
  • Debugged issues, fixed critical bugs and assisted in code deployments to QA and production
  • Coordinated with the external teams to assure the quality of master data and conduct UAT/integration testing
  • Implemented Power Exchange CDC for mainframes to load certain large data modules into the data warehouse and implement changing data
  • Designed and developed exception handling, data standardization procedures and quality assurance controls
  • Used Cognos for analysis and presentation layers
  • Develop Cognos 10 cubes using Framework Manager, Report Studio and Query Studio
  • Provide performance management and tuning
  • Develop in several BI reporting tool suites
  • Provide technical oversight to consultant partners

Environment: Informatica, Java/SOAP/Web Services, SQL Server Management Studio (SSMS), MS Visual Studio 2008, Oracle, DB2, SAS, Shell Scripting, TOAD, SQL Plus, Scheduler

We'd love your feedback!