We provide IT Staff Augmentation Services!

Sr. Data Engineer Resume

4.00/5 (Submit Your Rating)

Phoenix, AZ

SUMMARY

  • Seven plus years of experience in Analysis, Design, Development and Implementation as aData Engineer. Expert in providingETL solutionsfor any type of business model.
  • Expertise in transforming business requirements into analytical models, designing algorithms, building models, developing Data Mining, Data Acquisition, Data Preparation, Data Manipulation, Feature Engineering, Machine Learning Algorithms, Validation and Visualization and reporting solutions dat scales across massive volume of structured and unstructured Data.
  • Fluent programming experience wif Scala, Java, Python, SQL, T - SQL, R.
  • Hands-on experience in developing and deploying enterprise-based applications using major Hadoop ecosystem components like MapReduce, YARN, Hive, HBase, Flume, Sqoop, Spark MLlib, Spark GraphX, Spark SQL, Kafka.
  • Adept at configuring and installing Hadoop/Spark Ecosystem Components.
  • Proficient wif Spark Core, Spark SQL, Spark MLlib, Spark GraphX and Spark Streaming for processing and transforming complex data using in-memory computing capabilities written in Scala. Worked wif Spark to improve efficiency of existing algorithms using Spark Context, Spark SQL, Spark MLlib, Data Frame, Pair RDD's and Spark YARN.
  • Experience in development and design of various scalable systems usingHadooptechnologies in various environments. Extensive experience in analyzing data using Hadoop Ecosystems includingHDFS, MapReduce, Hive & PIG.
  • Experience in understanding teh security requirements for Hadoop.
  • Extensive experience in working wifInformatica PowerCenter
  • ImplementedIntegration solutionsforcloud platformswifInformatica Cloud.
  • Worked wif Java based ETL tool,Talend.
  • Proficient inSQL, PL/SQLandPythoncoding.
  • Experience developingOn - premiseandReal Time processes.
  • Excellent understanding of best practices ofEnterprise Data Warehouseand involved in Full life cycle development ofData Warehousing.
  • Expertise inDBMSconcepts.
  • Involved in buildingData ModelsandDimensional Modelingwif3NF, Star and Snowflakeschemas forOLAPandOperational data store (ODS)applications.
  • Skilled in designing and implementingETL Architecturefor cost TEMPeffective and efficient environment.
  • Optimized and tuned ETL processes & SQL Queries for better performance.
  • Performed complexdata analysisand provided critical reports to support various departments.
  • Work wif Business Intelligence tools likeBusiness Objectsand Data Visualization tools likeTableau.
  • ExtensiveShell/Python scriptingexperience for Scheduling and Process Automation.
  • Good exposure to Development, Testing, Implementation, Documentation and Production support.
  • Develop TEMPeffective working relationships wif client teams to understand and support requirements, develop tactical and strategic plans to implement technology solutions, and TEMPeffectively manage client expectations.
  • Ability to handle multiple tasks and work well in a team environment.

TECHNICAL SKILLS

Database: Oracle, MySQL, SQLite, NO SQL, RDBMS, SQL Server 2014, HBase 1.2, MongoDB 3.2. Teradata, Netezza. Cassandra

Bigdata Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeeper, Flume and HBase, Amazon EC2, S3 and Red Shift), Spark, Storm, Impala, Kafka.

Cloud Technologies: AWS, Azure, Google cloud platform (GCP)

Ensemble and Stacking: Averaged Ensembles, Weighted Averaging, Base Learning, Meta Learning, Majority Voting, Stacked Ensemble, AutoML - Scikit-Learn, MLjar, etc.

BI Tools: Business Objects XI, Tableau 9.1

Query Languages: Java, SQL, Python Programming, SQL, PL/SQL, T-SQL Linux shell scripts, Scala.

Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools: Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, YARN, Hortonworks, Cloudera, Mahout, MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, NI-FI, GCP, Google Shell, Linux, Big Query, Bash Shell, Unix, Tableau, Power BI, SAS, We Intelligence, Crystal Reports, Dashboard Design.

PROFESSIONAL EXPERIENCE

Confidential, Phoenix, AZ

Sr. Data Engineer

Responsibilities:

  • Responsible for teh execution ofbig data analytics, predictive analytics and machine learning initiatives.
  • Implemented a proof of concept deploying this product inAWS S3 bucketandSnowflake.
  • Utilize AWS services wif focus on big data architect /analytics / enterprise Data warehouse and business intelligence solutions to ensure optimal architecture, scalability, flexibility, availability, performance, and to provide meaningful and valuable information for better decision-making.
  • DevelopedScalascripts,UDF's using bothdata frames/SQL and RDDinSparkfor data aggregation, queries and writing back into S3 bucket.
  • Experience indata cleansing and data mining.
  • Wrote, compiled, and executed programs as necessary using Apache Spark in Scala toperform ETL jobswif ingested data.
  • UsedSpark Streamingto divide streaming data into batches as an input to Spark engine for batch processing.
  • Wrote Spark applications for data validation, cleansing, transformation, and custom aggregation and used Spark engine,Spark SQLfordata analysisand provided to teh data scientists for further analysis.
  • Prepared scripts to automate teh ingestion process usingPythonandScalaas needed through various sources such asAPI, AWS S3, Teradata and snowflake.
  • Designed and Developed Spark workflows using Scala for data pull from AWS S3 bucket and Snowflake applying transformations on it.
  • ImplementedSpark RDD transformationstoMap business analysis and apply actions on top of transformations.
  • Automated resulting scripts and workflow usingApache Airflowandshell scriptingto ensure daily execution in production.
  • Created scripts to readCSV, json and parquet filesfrom S3 buckets inPythonand load intoAWS S3, DynamoDB and Snowflake.
  • ImplementedAWS Lambdafunctions to run scripts in response to events in Amazon DynamoDB table or S3 bucket or to HTTP requests using Amazon API gateway
  • Migrated data from AWS S3 bucket to Snowflake by writing custom read/writesnowflake utilityfunction using Scala.
  • Worked on Snowflake Schemas and Data Warehousing andprocessedbatch and streaming data load pipeline usingSnow Pipeand Matillion from data lake Confidential AWS S3 bucket.
  • Utilized Apache Spark wif Python to develop and execute Big Data Analytics and Machine learning applications, executed machine learning use cases under Spark ML and Mllib.
  • Developed and deployed various Lambda functions in AWS wif in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala wif custom Libraries.
  • Automated resulting scripts and workflow using Apache Airflow and shell scripting to ensure daily execution in production.
  • Actively involved in writing T-SQL Programming for implementing Stored Procedures and Functions and cursors, views for different tasks.
  • Used Apache Falcon for mirroring of HDFS and HIVE data.
  • Participate in teh design, build and deployment of NoSQL implementations like MongoDB.
  • Extensively used Apache Kafka, Apache Spark, HDFS and Apache Impala to build a near real time data pipelines dat get, transform, store and analyze click stream data to provide a better personalized user experience.
  • Worked on Big data on AWS cloud services i.e., EC2, S3, EMR and DynamoDB.

Confidential, Charlotte, NC

Sr. Data Engineer

Responsibilities:

  • Developing Spark programs wif Python, and applied principles of functional programming to process teh complex structured data sets.
  • Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for teh business.
  • Teh individual will be responsible for design and development of High-performance data architectures which support data warehousing, real-time ETL and batch big-data processing.
  • Worked wif Hadoop infrastructure to storage data in HDFS storage and use Spark / HIVE SQL to migrate underlying SQL codebase in AWS.
  • Converting Hive/SQL queries into Spark transformations using Spark RDDs and Pyspark
  • Analyzing SQL scripts and designed teh solution to implement using PySpark
  • Export tables from Teradata to HDFS using Sqoop and build tables in Hive.
  • Loaded and transformed large sets of structured, semi structured and unstructured data usingHadoop/Big Data concepts.
  • Use SparkSQL to load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using SparkSQL.
  • Worked wif Hadoop ecosystem and Implemented Spark using Scala and utilized DataFrames and Spark SQL API for faster processing of data.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Develop RDD's/Data Frames in Spark using and apply several transformation logics to load data from Hadoop Data Lakes.
  • Filtering and cleaning data using Scala code and SQL Queries.
  • Performed analysis of implementing Spark using Scala and wrote spark sample programs using PySpark
  • Analyzed sales of past few years using data mining tools like R Studio, Python. nvolved in designing optimizing Spark SQL queries, Data frames, import data from Data sources, perform transformations, perform read/write operations, save teh results to output directory into HDFS/AWS S3.
  • Responsible for building scalable distributed data solutions using EMR cluster environment wif Amazon EMR.
  • Worked on Kafka REST API to collect and load teh data on Hadoop file system and also used Sqoop to load teh data from relational databases.
  • Extract Real time feed using Kafka and Spark Streaming and convert it to RDD and process data in teh form of Data Frame and save teh data as Parquet format in HDFS.
  • Successfully loading files to Hive and HDFS from Oracle, SQL Server usingSQOOP.

Confidential

Data Engineer

Responsibilities:

  • Implemented reporting Data Warehouse wif online transaction system data.
  • Developed and maintained data warehouse for PSN project.
  • Provided reports and publications to Third Parties for Royalty payments.
  • Managed user account, groups and workspace creation for different users in PowerCenter.
  • Wrote complex UNIX/windows scripts for file transfers, emailing tasks from FTP/SFTP.
  • Worked wif PL/SQL procedures and used them in Stored Procedure Transformations.
  • Extensively worked on oracle and SQL server. Wrote complex SQL queries to query ERP system for data analysis purpose
  • Worked on most critical Finance projects and had been teh go to person for any data related issues for team members.
  • Migrated ETL code from Talend to Informatica. Involved in development, testing and post production for teh entire migration project.
  • Documented teh code.
  • Tuned ETL jobs in teh new environment after fully understanding teh existing code.
  • Maintained Talend admin console and provided quick assistance on production jobs.
  • Involve in designing Business Objects universes and creating reports.
  • Built adhoc reports using stand-alone tables.
  • Involved in creating and modifying new and existing Web Intelligence reports.
  • Created Publications which split into various reports based on specific vendor.
  • Wrote Custom SQL for some complex reports.
  • Worked wif business partners internal and external during requirement gathering.
  • Worked closely wif Business Analyst and report developers in writing teh source to target specifications for Data warehouse tables based on teh business requirement needs.
  • Exported data into excel for business meetings which made teh discussions easier while looking at teh data.
  • Performed analysis after requirements gathering and walked team through major impacts.
  • Provided and debugged crucial reports for finance teams during month end period.

Confidential

Data Engineer

Responsibilities:

  • Analyze and cleanse raw data using HiveQL
  • Experience in data transformations using Map-Reduce, HIVE for different file formats.
  • Involved in converting Hive/SQL queries into transformations using Python
  • Performed complex joins on tables in hive wif various optimization techniques
  • Created Hive tables as per requirements, internal or external tables defined wif appropriate static and dynamic partitions, intended for efficiency
  • Worked extensively wif HIVE DDLS and Hive Query language(HQLs)
  • Involved in loading data from edge node to HDFS using shell scripting.
  • Understand and manage Hadoop Log Files.
  • Manage Hadoop infrastructure wif Cloudera Manager.
  • Created and maintained technical documentation for launching Hadoop cluster and for executing Hive queries.
  • Build Integration between applications primarily Salesforce.
  • Extensive work in Informatica Cloud.
  • Expertise in Informatica cloud apps Data Synchronization (ds), Data Replication (dr), Task Flows, Mapping configurations, Real Time apps like process designer and process developer.

Confidential

ETL Developer

Responsibilities:

  • Gathered business requirements and prepared technical design documents, target to source mapping document, mapping specification document.
  • Extensively worked on Informatica PowerCenter.
  • Parsed complex files through Informatica Data Transformations and loaded it to Database.
  • Optimized query performance by oracle hints, forcing indexes, working wif constraint based loading and few other approaches.
  • Extensively worked on UNIX Shell Scripting for splitting group of files to various small files and file transfer automation.
  • Worked wif Autosys scheduler for scheduling different processes.
  • Performed basic and unit testing.
  • Assisted in UAT Testing and provided necessary reports to teh business users.
  • Coordinated design reviews, ETL code reviews wif teammates.
  • Developed mappings using Informatica to load data from sources such as Relational tables, Sequential files into teh target system.
  • Extensively worked wif Informatica transformations.
  • Extensively worked on UNIX Shell Scripting for file transfer and error logging.

We'd love your feedback!