Senior Data Engineer Resume

SUMMARY:

DataEngineerwith 7 years of experience in transforming industry requirements into analytical standards, designing algorithms, building models, developingdatamining and reporting
Experienced with processing different file formats like Avro, XML, JSON and sequence file formats using MapReduce programs. explanations that scale across a tremendous volume of structured and unstructureddata.
Hands on experience indatamining process, implementing complex business logic and optimizing thequery using Hive QL and controlling thedatadistribution by partitioning and bucketing techniques to enhance performance.
Experience using IDEs tools Eclipse 3.0, My Eclipse and Net Beans. expertise working in a variety of industries including Banking and Healthcare.Expert knowledge in SDLC (Software Development Life Cycle) and was involved in all phases in projects
Experienced in using IDEs and Tools like Eclipse, GitHub, Maven, PyCharm.
Hands - on experience in analyzing requirements, identifying and developing the rightETLdesign to achieve customer business goals.
Experienced in database design,dataanalysis, development, SQL performance tuning,datawarehousingETLprocess anddataconversions.
Experience in writing UDF’s using java, Scala for Hive QL and Spark.
Expertise in using Cloud based managed services fordatawarehousing inConfidential Azure (AzureDataLake Storage, AzureDataFactory).
Strong experience with bigdataprocessing using Hadoop technologies Map Reduce,Apache Spark, Apache Hive and Pig
Experience in designing and creating RDBMS Tables Views User CreatedDataTypes,Indexes Stored Procedures. Cursors Traggers and Transactions
Expertise indataanalysis and development using SQL Server Reporting Services
(SSRS), SQL Server Integration Services (SSIS) and SQL Server Analysis Services(SSAS).
Expert in usage ofCloudSQL, Big Table andDatastore.
Experience in interpretingdatausingPython, R, SQL, Microsoft Excel, Hive, PySpark, and Spark SQL forDataMining,DataCleansing,DataMunging, and Machine Learning.
Experienced in working with variousPythonIDEs using PyCharm, PyScripter, Spyder, PyStudio PyDev, IDLE, NetBeans, and Sublime Text
Experience inDataWarehousing and Database Management. Areas of the profession includeData
Analysis,DataIntegration (ETL),DataArchitecture,DataModelling (Both Dimensional and Relational

Models), Database Design,DataFederation, Metadata/Semantic/Universe Design, Static and

OLAP/Cube Reporting, and Testing.
Hands-on experience in handling database issues and connections with SQL and NoSQL databases likeMongoDB, Cassandra, Redis, CouchDB, and DynamoDB by installing and configuring various packages inpython

WORK EXPERIENCE:

Senior Data Engineer

Confidential

Responsibilities:

Wrote python scripts to automate the ingestion process in AWS S3 and Redshift.
Designed several DAGs (Directed Acyclic Graph) for automating ETL pipelines
Automated workflow using Apache Airflow and shell scripting to ensure daily execution in production.
Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
Implement One timeDataMigration of Multistate leveldatafrom SQL server to Snowflake by using Python and SnowSQL.
Developing ETL workflows using AWS Glue and Spark sourcingdatafrom Teradata, S3 and RedShift.
Building machine learning model to analyze driving behavior on Bigdata
Identified and documented Functional/Non-Functional and other related business decisions for implementing Actimize-SAM to comply with AML Regulations.
Prepared SQL queries for profiling semi-structureddatato identify patterns and implementdataquality metrics.
Design, develop, test, implement and maintaindataprocesses of moderate to large size and complexity.
Developed multipleETLjobs and automated them using Alteryx and Alteryx Server.
Good understanding NoSQL Mongo Database and front-end application event message information stored into Mongo Database.
Following AGILE development methodology for the complete life cycle of the project.
Used AWS EMR, AWS Glue and PySpark to move and transform large amounts ofdatainto S3, DynamoDB and Redshift.
Used Agile for software development lifecycle
Ingesteddatafrom disparate sources, including but not limited to, internal production reporting system, public governmentdataand paid databases using combination of SQL, Python and system-specific API to createdataviews to be used in BI tools like TIBCO Spotfire.
Built ELT(Extract, Load, Transform) pipelines
Interacted with Business Users, Scrum Masters and peer engineers to analyze system requirements, design, develop, and tested end-to-end web-based applications
Implemented Python and Go scripts REST APIs to connect different file systems using bigdatatechs like HDFS,MapReduce, Hive, between database systems like Oracle PostgreSQL, MySQL, Cassandra and MongoDB and external systems like SAP for ETL process
Migrated pipelines from Kubeflow toData-bricks
Written Pre and Post Session SQL commands (DDL & DML) to drop and recreate the indexes ondatawarehouse
Collaborating with team as and when required to help solveETLanddatarelated issues.
UsedPythonlibraries and SQL queries/subqueries to create several datasets which produced statistic statistics, tables, figures, charts and graphs
Maintain the dailyETLschedule and recover the daily failures and generate the daily reports for users.
Good knowledge on Amazon AWS concepts like EMR & EC2 web services which provides fast and efficient processing ofBigData.
Collaborated withdataengineers and Operation team to implement ETL process, Snowflake models wrote and optimized SQL queries to performdataextraction to fit the analytical requirements.
Developed multiple Kafka Producers and Consumers from scratch to as per the requirement specifications.
Developed Scala scripts using bothDataframes/SQL/Datasets and RDD/MapReduce in Spark forDataAggregation, queries and writingdataback into OLTP system through Sqoop.
Need to builddatawith increased implementation which is trustworthy for the objective of the industry using tools like SQL, and MySQL which support the tools utilized bydatascientists.
Handledatain real time: Developed a Kafka application using Python, Spark,Confluent Kafka, Docker and OpenShift to providedatain real time to 4 different teams for them to consume thedatain their dashboards.
Designing CI/CD pipeline and hands-on knowledge with CI/CD tools like Jenkins, Git and
KANBAN.
Setup services principals to allow access toData-bricks through Terra-form
Wrote and refactored SQL ETL Pipelines in Facebook Internal Airflow using Presto, HiveQL on TBs of messagingdata, monitoringdataquality checks and managing SLA's
Substantial Knowledge indataengineering and building ETL channels on batch and flowingdatausing PySpark, and Spark SQL. Suitable command of writing queries in Teradata forData
Skilled in buildingdatapipelines in AWS using services like S3, EC2, EMR,
Created UNIX shell scripts for automating regular maintenance activities using Informatica Command line utilities.
UsedDataFrame API in Scala for converting the distributed collection ofdataorganized into named columns, developing predictive analytic using Apache Spark,Scala APIs.
CreatedETLjobs to load JSONdataand serverdatainto HDFS and transported HDFS into the TeradataDataWarehouse
Used Spark Streaming to receive real timedatafrom the Kafka and store the streamdatato HDFS using Python and NoSQL databases such as HBase and Cassandra
Developed variousETLand ELT jobs using Alteryx and Qlik Compose.
Developed Spark SQL applications to perform complexdataoperations on structured and semi-structureddatastored as Parquet, JSON, XML files in S3 buckets
Visualized querieddatawith Tableau dashboards and Python to provide stakeholders with actionable insight to use for business improvements
Used Spark SQL to load JSONdataand create Schema RDD and loaded it into Hive Tables and handledstructureddatausing Spark SQL.

Environment: BigData, Hadoop, Hive, Sqoop, Spark SQL, Spark, HDFS, SQL server 2014/2016, Oracle 12c/11gDB2, Toad, Unix, Python, Snowflake, Teradata 14, Erwin,Datastage Autosys

Data Engineer

Confidential:: Austin, TX

Responsibilities:

Responsible for the back-end development ofdataplatform and management of both streaming and batchdata.
Maintained and developed complex SQL queries, Views, Functions and reports that qualify customer requirements on Snowflake.
Built stored procedures in MS SQL to fetchdatafrom several servers via FTP and process the files in order to update the tables
DevelopETLcode as per the business requirements by effectively interacting with all the source systems involved.
Developed C++ classes using JNI to simplify the interfacing of the core modules and the JAVA GUI
Processingdatawith Scala, spark, spark SQL and load in hive partition tables in parquet file format.
Created the build systems for compilation and deployment on Linux, Windows and Mac OS X
Developeddataanalysis & grading tool using VBA &PowerBI, that used classical reservoir engineering techniques to identify and rank drilling & workover potential within new and existing assets to maximize ROI.operating systems.Maintained and updated the JAVA GUI developed with swing.
Extensively written many SQL scripts against Facets Enrollment tables (Sybase) & Application DB (SQL server) and validated the values with EDI file.
Utilized AWS S3 services to push/store and pull thedatafrom AWS from external applications
Involved in developing CICD (Continuous Integration and Continuous Delivery) using
Jenkins.
Designing end to endETLprocesses to support reporting requirements. Designing aggregates, summary tables and materialized views for reporting.
Work with real timedatastreams & API ‘s from multiple internal/external sources.
Assemble the infrastructure instructed for optimal extraction, modification, and loading ofdatafrom a vast mixture ofdatasources using SQL, ELT tool,python, or 'big data' technologies.
Formed A. I machine learning algorithms like Classification, Regression, and Deep Learning using
Python.
Used Spark Streaming to receive real timedatafrom the Kafka and store the streamdatato HDFS using Python and NoSQL databases such as HBase and Cassandra
ed Best Debutant and Most Valuable Player for consistent performance and problem-solving analysis.
Designed and developed aDataWarehouse with sourcedataharvested from different assets: wind farms, substations, SAP Hana, marketdata, met towers.
Designing and developing ETL (extract-transform-load) processes to transform thedata, populatedatamodels etc., using HADOOP, Spark,Python, Redshift PostgreSQL and other technologies in the AWS cloud.
Maintained the user accounts (IAM),CloudSQL,CloudDNS, VPC RDB,Cloud DatastoreCloudBigtable,SES, SQS andCloudPub/Sub services in Googlecloudplatform.
Design and codedatapipeline features anddataprocessing jobs for collecting variousdatafrom customer, operations, and financials systems.
Involved in agile software methodologies using TEST DRIVEN DEVELOPMENT (TDD).
Used JIRA to keep track of bugs to reduce downtime, and increase productivity and communication.
Performed Code review to ensure thatETLdevelopment was done according to the company'sETLstandard and thatETLbest practices were followed.
Buildingdatapipeline components on AwS, Azure & GoogleCloudPlatform using Apache airflow and
Azure Databricks and Luigi
Automate analyses and authoring pipelines using SQL, ETL framework

Environment: ETL, BigData, Hive, Spark, UNIX, Windows, Oracle 10g, PL/SQL, MSSQL 2008R2, SQL, Flat Files and CSV, Shell Scripting, Putty, PL/SQL Developer.

Data Analyst

Confidential: : Tempa, FL

Responsibilities:

Functioned on different file formats like Sequence files, XML files, and Map files using Map Reduce Programs.
Performed with the AvroDataSerialization system to work with JSONdatastructures.
Converted SQL Server Stored Procedures to Redshift PostgreSQL and embedded them inPythonpandas framework.
Utilize inferential analysis formulas and tools to provide specializeddataanalysis
Created Excel BI Analysis and reports using VLOOKUP, Pivot tables and DAX languages
Worked on development ofdatawarehouse.DataLake and ETL systems using relational and non relationaltools like SQL, No SQL
Migration ofETLprocesses from RDBMS to Hive to test the easydatamanipulation.
Performeddataextractions, transformations, exploratorydataanalysis and statistical analysisin both Alteryx and Python.
Performed exploratorydataanalysis, statisticaldataanalysis, seasonality andtrendidentification using predictive analytics in Alteryx, Python and Tableau.
Provided technical expertise indatastorage structures,datamining, anddatacleansing.
Involved in the process of inspecting, cleansing, transforming, and modelingdatawith the goal of discovering useful information, informing conclusions, and supporting decision-making.
Performeddatanormalization, queries, schema creations&modification using Microsoft SQL
Analyzeddataby writing Apache Hadoop custom MapReduce programs in Java and UDFs for Pig and Hive using Java in order to analyze thedataefficiently
Exported the analyzeddatato the Relational databases using Sqoop for performing visualization and generating reports for the Business Intelligence team.

Environment: MS Excel, SQL Server Management Studio, Access, Tableau

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship