Senior Data Engineer Resume Des Moines, IA - Hire IT People

SUMMARY

Around 7+ years of experience in Analysis, Design, Development, and Implementation as aData Engineer
Expert in providingETL solutionsand ETL process for any type of business model
Develop effective working relationships with client teams to understand and support requirements, develop tactical and strategic plans to implement technology solutions, and effectively manage client expectations
An excellent team member with an ability to perform individually, good interpersonal relations, strong communication skills, hardworking and a high level of motivation
Excellent knowledge of Machine Learning, Mathematical Modelling, and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of the Big Data Ecosystem
Experience in development and design of various scalable systems usingHadooptechnologies in various environments
Extensive experience in analysing data using Hadoop Ecosystems includingHDFS, MapReduce, Hive & PIG
Extensively used Python Libraries PySpark, Pytest, Pymongo, cxOracle, PyExcel, Boto3, Psycopg, embedPy, NumPy, and Beautiful Soup.
Experience in understanding the security requirements for Hadoop
Extensive experience in working withInformatica PowerCenter
Good Hands - on expertise with AWS storage services such as S3, EFS, Storage Gateways and AWS compute services such as EC2, Elastic MapReduce (EMR), EBS and accessing Instance metadata.
ImplementedIntegration solutionsforcloud platformswithInformatica Cloud
Proficient inSQL, PL/SQL,andPythoncoding. Worked with Java-based ETL tool,Talend
Expertise in debugging and optimizing Oracle and java performance tuning with strong knowledge inOracle 11g and SQL
Experience in data warehousing and business intelligence using various ETL tools Informatica, and Business Objects
Experience in developing customized UDF’s in java to extend Hive and Pig Latin functionality
Experience developingOn-premisesandReal-Time processes
Excellent understanding of best practices ofEnterprise Brehouseand involved in Full life cycle development ofData Warehousing
Experience in Data Analysis, Data Migration, Data Validation, Data Cleansing, Data Verification and identifying Data Mismatch
Good knowledge in using apache NiFi to automate the data movement between different Hadoop systems
Experience in Big Data technologies like Spark, SparkSQL, pySpark, Hadoop, HDFS, Hive
Expertise inDBMSconcepts
Experience in working with Azure Monitoring, Data Factory, Traffic Manager, Service Bus, Key Vault
Involved in buildingData ModelsandDimensional Modellingwith3NF, Star and Snowflakeschemas forOLAPandOperational data store (ODS)applications
Skilled in designing and implementingETL Architecturefor a cost-effective and efficient environment
Optimized and tuned ETL processes & SQL Queries for better performance
Performed complexdata analysisand provided critical reports to support various departments
Work with Business Intelligence tools likeBusiness Objectsand Data Visualization tools likeTableau
ExtensiveShell/Python scriptingexperience for Scheduling and Process Automation
Good exposure to Development, Testing, Implementation, Documentation, and Production support
Experience in one or more data platform services such as SQL, CosmosDB, MongoDB, Oracle, Hadoop
Proficiency in multiple databases like MongoDB, Cassandra, My SQL, ORACLE, and MS SQL Server
Data Platform development using Spark, Greenplum, and Hadoop
Exposure to NoSQL databases such as MongoDB, HBase, and Cassandra. Created Java apps to handle data in MongoDB and HBase
Designing, building, and publishing Cognos Multi-Dimensional OLAP Cube solutions
Good experience in the design and implementation of fully automated Continuous Integration, Continuous Delivery, Continuous Deployment pipelines, and DevOps processes for Agile projects (CI/CD)
Database Design (Conceptual, Logical) and Programming Amazon Redshift, Microsoft Azure, BigData Ecosystem, Oracle PL/SQL, Teradata, Erwin, Power Designer, and OLAP on Hadoop using HDInsight
Building Experience ETL data pipeline/ ETL workflows on Hadoop/Teradata using Hadoop/Pig/Hive/UDFs
Well-versed in version control and CI/CD tools such as SVN, GIT, SourceTree, Bitbucket, etc.
Experience in Amazon Web Services (AWS) products S3, EC2, EMR, and RDS
Strong experience in the design and development of Business Intelligence solutions using data modelling, Dimension Modelling, ETL Processes, Data Integration, OLAP, and client /server application
Extensive experience in Agile software development methodology
Experience in Migrating SQL database to Azure Data Lake, Azure data lake Analytics, Azure SQL Database, Data Bricks, and Azure SQL Data warehouse and Controlling and granting database access and Migrating On-premise databases to Azure Data lake store using Azure Data factory
Good understanding of Big Data Hadoop and Yarn architecture along with various Hadoop Demons such as Job Tracker, Task Tracker, Name Node, Data Node, Resource/Cluster Manager, and Kafka (distributed stream-processing)

TECHNICAL SKILLS

Languages: PL/SQL, SQL, T-SQL, C, C++, XML, HTML, DHTML, HTTP, MATLAB, Python

Databases: SQL Server 20017, MS-Access, Oracle 11g, Sybase and DB2

Database Design Tools and Data Modelling: Fact & Dimensions tables, physical & logical data modelling Normalization and Denormalization techniques, Kimball

Tools: and Utilities: SQL Server 2016/2017, SQL Server Enterprise Manager, TOAD, SQL, Server Profiler, Import & Export Wizard, Visual Studio v14, .Net, Microsoft Management Console, Visual SourceSafe 6.0, DTS, Crystal, Reports, Power Pivot, ProClarity, Microsoft Office 2007/10/13, Excel Power Pivot, Excel Data Explorer, Tableau 8/10, JIRA

Web Services: REST, SOAP

Development Build & Integration Tools: Eclipse, Maven, Jenkins, IntelliJ, Log4J

Operating Systems: Microsoft Windows 8/7/XP, Linux and UNIX

Cloud Technologies: AWS, Azure

Testing Management Tools: Bugzilla, JIRA, Quality Centre, QTP

SDLC Methodologies: Agile, Scrum, Waterfall

PROFESSIONAL EXPERIENCE

Confidential, Des Moines, IA

Senior Data Engineer

Responsibilities:

Worked on designing and developing the Real-Time Tax Computation Engine usingOracle, Stream Sets, Kafka, Spark Structured Streaming, andMySQL
Implemented Spark using Scala and utilizing Data frames andSpark SQLAPI for faster processing of data
Involved in ingestion, transformation, manipulation, and computation of data usingStream Sets, Kafka, MySQL, Spark
Authoring Python (PySpark) Scripts for custom UDF's for Row/ Column manipulations, merges, aggregations, stacking, data labeling, and for all Cleaning and conforming tasks.
Involved in data ingestion intoMySQLusingKafka - MySQL pipelinefor a full load and Incremental load on a variety of sources like web server,RDBMS,and Data API’s
Worked on Spark Data sources, Spark Data frames,Spark SQL, and Streaming using Scala
Worked extensively on AWS Components such as Elastic Map Reduce (EMR), Elastic Compute Cloud (EC2), Simple Storage Service (S3)
Created several DatabricksSpark jobs with Pyspark to perform several tables to table operations.
Build ETL pipeline end to end from AWS S3 to Key, Value store DynamoDB, and Snowflake Datawarehouse for analytical queries and specifically for cloud data
Experience in developingSparkapplication usingScala SBT
Experience in integratingSpark-MySQL connectorandJDBC connectorto save the data processed inSparktoMySQL
Responsible for creating tables andMySQL pipelineswhich are automated to load the data into tables fromKafkatopics
Performed a POC to check the time taking for Change Data Capture (CDC) of oracle data acrossStrim, Stream Sets, andDB Visit
Created instances in AWS as well as migrated data to AWS from data Center using snowball and AWS migration service and Implementations of generalized solution model using AWS SageMaker.
Leverage AWS Sage Maker to build, train, tune and deploy state of art Machine Learning and Deep Learning models.
Created continuous integration and continuous delivery (CI/CD) pipeline on AWS that helps to automate steps in software delivery process
Develop and deploy the outcome using spark and Scala code in Hadoop cluster running on GCP
Expertise in using different file formats likeText files, CSV, Parquet, JSON
Experience in custom compute functions usingSpark SQLand performed interactive querying
Responsible for masking and encrypting the sensitive data on the fly
Responsible for creating multiple applications for reading the data from different Oracle instances to Kafka topics usingStrim
Extensive experience in deploying, managing, and developing MongoDB clusters. Creation, configuration, and monitoring Shards sets
Analysed current state Reporting Database (Access Based) and identifying the front-end user screen functionality, providing solutions and a detailed summary of their existing database functionality to the Business teams. Provided detailed data workflow diagrams for the existing reporting database
Responsible for setting up a MySQL cluster on AWS EC2 Instance
Configuring high availability using geographical MongoDB replica sets across multiple data centers.
Experience in Real-time streaming the data usingSparkwithKafka
Performed importing data from various sources to the Cassandra cluster using Java APIs or Sqoop
Responsible for creating a Kafka cluster using multiple brokers
Experience working on Vagrant boxes to setup local Kafka and Stream Sets pipelines

Environment: Spark 2.2, Scala, Linux, MySQL 5.8, Kafka 1.0, Striim, Streamsets, Spark SQL, Spark Structured Streaming, AWS EC2, EMR, IntelliJ, SBT, git, VagrantMetadata, MS Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, Pyspark, SQL, and MongoDB, Workday HCM, Workday conversions, Workday Report Writes, Data Modeling

Confidential, Dorchester, MA

Data Engineer

Responsibilities:

Implemented machine learning methods, optimization, and visualization, a mathematical model of statistics such as Regression Models, Decision Tree, Naïve Bayes, Ensemble Classifier, Hierarchical Clustering, and Semi-Supervised Learning on different datasets using Python
Configuring a Workday system to meet each client's unique business requirements. Also developed test scripts for other outside systems that interface with Workday
Researched and implemented various Machine Learning Algorithms using the R language
Devised a machine learning algorithm using Python for facial recognition
Used R for a prototype on a sample data exploration to identify the best algorithmic approach and then wrote Scala scripts using spark machine learning module
Used Scala scripts for spark machine learning libraries API execution for decision trees, ALS, logistic and linear regressions algorithms
Configuring new benefit Plans in the Workday system and to do mass uploads of Employees to that Plans
Integration of data stored in S3 with Databricks to perform ETL processes using pyspark and spark SQL.
Worked on Migrating an on-premises virtual machine to Azure Resource Manager Subscription with Azure Site Recovery
Recreating existing application logic and functionality in the Azure Data Lake, Data Factory, SQL Database, and SQL data warehouse environment. experience in DWH/BI project implementation using Azure DF
Creating Spark clusters and configuring high concurrency clusters using Azure Databricks to speed up the preparation of high-quality data
Extracting the data from Azure Data Lake into HDInsight Cluster (INTELLIGENCE + ANALYTICS) and applying spark transformations & Actions and loading into HDFS
Provided consulting and cloud architecture for premier customers and internal projects running on MS Azure platform for high availability of services, low operational costs
Developed structured, efficient, and error-free codes for Big Data requirements using Hadoop and its Eco-system
Development of web service using Windows Communication Foundation and Net to receive and process XML files and deploy on Cloud Service on Microsoft Azure
Used CosmosDB for partitioning the data for high availability and scalability
Implement ETL process to move data from CosmosDB to SQL Azure Database using SQLizer, SSIS, and SQL Azure Database
Started using apache NiFi to copy the data from the local file system to HDFS
Analyzed pre-existing predictive model developed by advanced analytics team and factors considered during model development
Focused on Test Driven Development thereby creating detailed JUnit tests for every single piece of functionality before writing the functionality
Involved in preparing Logical DataModels/PhysicalData Models
Validated the Map-reduce, Pig, Hive Scripts by pulling the data from the Hadoop and validating it with the data in the files and reports
Experienced in all phases of data mining: data collection, data cleaning, developing models, validation, and visualization
Analyzed metadata and processed data to get better insights of the data
Created initial data visualizations in tableau to provide basic insights of data to the project stakeholders
Application of various machine learning algorithms and statistical modeling like decision trees, regression models, clustering, SVM to identify Volume using Scikit-learn package in Python
Conducted regular communications with leaders of other teams to get a better understanding of the data at a deeper level
Extensively worked on the naming standards which incorporated the enterprise data modelling
Developed visualizations using R packages like ggplot2, choroplethr to identify patterns and trends in the preprocessed data
Experienced in RStudio packages and Python libraries like SciKit-Learn to improve the model accuracy from 65% to 86%
Provided conceptual and technical modeling assistance to developers and DBA's using Erwin and Model Mart Validated Data Models with IT, team members, and Clients
Experienced in various Python libraries like Pandas, One dimensional NumPy, and Two dimensional NumPy
Experienced in using PyTorch library and implementing natural language processing
Developed data visualizations in Tableau to display day to day accuracy of the model with newly incoming data
Hold a point-of-view on the strengths and limitations of statistical models and analyses in various business contexts and can evaluate and effectively communicate the uncertainty in the results
Used Keras library to build and train deep learning models and fetched good results
Propensity model developed that was beneficial with a greater ROI compared to other models
Achieved 095 million dollars ROI per cycle with a cycle duration of one quarter year
Implemented complete data science project involving data acquisition, data wrangling, exploratory data analysis (EDA), model development, and model evaluation
Worked on various methods including data fusion and machine learning and improved the accuracy of distinguished right rules from potential rules
Developed Merge jobs in Python to extract and load data into a MySQL database
Used Test driven approach for developing the application and Implemented the unit tests using Python Unit test framework
Designed and documented REST/HTTP, SOAP APIs, including JSON data formats and API versioning strategy
Worked on developing Restful endpoints to cache application-specific data in in-memory data clusters like REDIS and exposed them with Restful endpoints
Wrote unit test cases in Python and Objective-C for other API calls in the customer frameworks
Tested with various Machine Learning algorithms like Support Vector Machine (SVM), Random Forest, Trees with XGBoost concluded Decision Trees as a champion model
Machine Learning, R Language, Hadoop, Big Data, Azure, Python, Pyspark, Java, J2EE, Spring, Struts, JSF, Dojo, JavaScript, DB2, CRUD, PL/ SQL, JDBC, coherence, MongoDB, Apache CXF, soap, Web Services, Eclipse, MS Access, Teradata, Advanced SQL, RStudio (ggplot2, caret), Tableau, Excel, Workday HCM, Workday conversions, Workday Report Writer

Confidential

Data Analyst

Responsibilities:

Collected data from the end client, performed ETL, and defined the uniform standard format
Wrote queries to retrieve data from SQL Server database to get the sample dataset containing needed fields
Performed string formatting on the dataset converting hours from date format to a numerical integer
Used Python libraries like Matplotlib and Seaborn to visualize the numerical columns of the dataset such as day of the week, age, hour, and number of screens
Create VBA programs to automatically update Excel workbooks, encompassing class and program modules and external data queries
Developed and implemented predictive models like Logistic Regression, Decision Tree, Support Vector
Machine (SVM) to predict the probability of enrollment
Used Ensemble learning methods like Random Forest, Bagging, Gradient Boosting and selected the final model based on confusion matrix, ROC, AUC predicted the probability of customer enrollment
Worked on missing value imputation, outlier identification with statistical methodologies using Pandas, NumPy
Tuned the hyperparameters of the above models using Grid Search to find the optimum models
Designed and implemented K-Fold Cross-validation to test and verify the model’s significance
Developed a dashboard and story in Tableau showing the benchmarks and summary of the model’s measure
Use tools extensively like R, Python, ODS, DB2, Metadata, MS Excel to analyze data from multiple perspectives and was able to provide a robust Machine Learning algorithm
Created new tools and business processes that simplify, standardize, and enables operational excellence
Used tools like Tableau for drilling-downdata, creatinginsightfulreports, and garnering actionable business insights
Documentation business requirements, technical requirements, application and data workflows, use cases, and test plans
Performed Database testing and the Report level testing as per the requirement with excellent knowledge in understanding the data workflow by referring through FSD’s (Functional Specification Document)
Excellent understanding of the mapping between Source and Target by referring to the mapping document
Performed end to end mapping testing for the database as well as reports
Mapping involved is one to one and its lift and shift process, that means need to check whether the data gathered in the target table is mapped properly to the source table and the same target table is populating the same records into the report tool (SAP-BO, QlikView) properly
Performed Smoke test to do the primary checks like record counts, column matching for database, and dashboard testing
Worked with data owners, Business Units, Data Integration team and customers in fast paced Agile/Scrum environment
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for Data ingestion and transformation in GCP, and coordinate task among the team
Performed testing at SIT (System Integration Testing) level and UAT (User Acceptance testing) level
Gathered requirements from the development team and database developers to analyze the tables and entity relationships for understanding the database
Designed the integration document/XLS derive the input and output of each of the integration points
Documented the acceptance criteria for each of the test cases. Built the test cases based on test scenarios
Created a test plan and strategy for the given LOB (Line of Business)
Written queries in BigQuery to lookup all the Customer, Product, Order level data
Verified import/export and obfuscation data
Verified known issues, development of workarounds and wrappers as required
Identified data scenarios, business cases, and created test case development
Scripted, automated test cases and identified source data pattern for generating reports
Developed scripts for comparison with the target. Planed and run the SIT (System Integration Testing) for the given LOB
Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy
Developed test cases, established traceability between requirements and test cases
Performed Data Analysis, Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export through Python
Provided inputs to the test lead for documentation and reporting purposes
Identified, documented, and updated testing dependencies and participants
Identified primary point of contact to raise the risks/issues around testing dependencies
Reported status on test execution including risks/issues and targets
Updated latest information in regular testing status meetings with all involved constituencies to ensure smooth test execution and timely issue resolution

Environment: Informatica Power Center, HP-ALM, SharePoint, MS-Visio, MS-Excel, Teradata SQL Assistant, QlikView, SAP-BO, Oracle 11g, Microsoft SQL Server, Tableau report builder, MS Outlook, SQL Server 2012/2014, Python (Scikit-Learn, NumPy, Pandas, Matplotlib, Dateutil, Seaborn), Tableau, Hadoop

We provide IT Staff Augmentation Services!

Senior Data Engineer Resume

Des Moines, IA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship