We provide IT Staff Augmentation Services!

Data Engineer Resume

4.00/5 (Submit Your Rating)

Santa Fe, NM

SUMMARY

  • Overall, 8+ years of technical IT experience in all phases of Software Development Life Cycle (SDLC) with skills in data analysis, design, development, testing and deployment of software systems.
  • 6+ yearsof industrial experience inBig Data analytics,Data manipulation, using Hadoop Eco system toolsMap - Reduce, HDFS, Yarn/MRv2, Pig, Hive, HDFS, HBase, Spark, Kafka, Flume, Sqoop, Flume, Oozie, Avro, Sqoop,AWS,Spring Boot, Spark integration with Cassandra, Avro, Solr and Zookeeper.
  • Experience in developing data pipelines using AWS services including EC2, S3, Redshift, Glue, Lambda functions, Step functions, CloudWatch, SNS, DynamoDB, and SQS.
  • Used Informatica Power center for (ETL) extraction transformation and loading data from heterogeneous source systems.
  • Proficiency in multiple databases like MongoDB, Cassandra, MySQL, ORACLE, and MS SQL Server. Worked on different file formats like delimited files, avro, json and parquet. Docker container orchestration using ECS, ALB and lambda.
  • Extensive knowledge onQlikView Enterprise Management Console (QEMC), QlikView Publisher, QlikView Web Server.
  • Implemented a batch process to load the heavy volume data loading using Apache Dataflow framework using Nifi in Agile development methodology.
  • Implement ETL process in Alteryx to extract data from multiple sources (SQL Server, XML, Excel, CSV) and schedule workflows.
  • Having4.2 yearsof experience inIBM BPMincluding design, and development of the IBM BPM driven applications.
  • Experience in Business Process Management usingIBM BPM 8.5.7, IBM BPM 8.5.6 and IBM BPM 7.5.2
  • Create visualization in QlikView
  • Experienced in backend and ETL testing (SQL, PL/SQL, DB2 and Oracle DB)
  • Worked as team JIRA administrator providing access, working assigned tickets, and teaming with project developers to test product requirements/bugs/new improvements.
  • Experienced in Pivotal Cloud Foundry (PCF) on Azure VM's to manage the containers created by PCF.
  • Hands on experience in test driven development(TDD),Behavior driven development(BDD)and acceptance test driven development (ATDD)approaches.
  • Managing Database, Azure Data Platform services (Azure Data Lake (ADLS), Data Factory (ADF), Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB), SQL Server, Oracle,Data Warehouse etc. Build multiple Data Lakes.
  • Strong technical knowledge with hands on experience in CICD using different DevOps toolset.
  • Extensive experience in Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, PowerBI.
  • Worked extensively with ETL Testing including Data Completeness, Data Correctness, Data Transformation, Data Quality with analyzing feed files in Downstream scope (Balance, Transactions and Master data type of feeds)
  • Worked with Google Compute Cloud Data Flow and Big Query to manage and move data within a 200 Petabyte Cloud Data Lake for GDPR Compliance. Also designed star schema in Big Query.
  • Extensive programming expertise in designing and developing web-based applications using Spring Boot, Spring MVC, Java servlets, JSP, JTS, JTA, JDBC and JNDI.
  • Experience in MVC and Microservices Architecture with Spring Boot and Docker, Swamp.
  • Expertise in Java programming and have a good understanding on OOPs, I/O, Collections, Exceptions Handling, Lambda Expressions, Annotations
  • Experience in Spring Frameworks like Spring Boot, Spring LDAP, Spring JDBC, Spring Data JPA, Spring Data REST
  • Experience in writing code in R and Python to manipulate data for data loads, extracts, statistical analysis, modeling, and data munging.
  • Manage end to end complex data migration, conversion, and data modeling (using Alteryx, SQL), and create visualization using tableau to develop high quality dashboards.
  • Familiar with latest software development practices such as Agile Software Development, Scrum, Test Driven Development (TDD) and Continuous Integration (CI).
  • Create Chart, graphs, piechart & trend in QlikView
  • Created ETL test data for all ETL mapping rules to test the functionality of the Ab Initiographs.
  • Utilized Kubernetes and Docker for the runtime environment for the CI/CD system to build, test, and deploy. Experience in working on creating and running docker images with multiple microservices.
  • Utilized analytical applications like R, SPSS, Rattle and Python to identify trends and relationships between different pieces of data, draw appropriate conclusions and translate analytical findings into risk management and marketing strategies that drive value.
  • Extensive hands-on experience in using distributed computing architectures such as AWS products (e.g. EC2, Redshift, EMR, and Elastic search), Hadoop, Python, Spark and effective use of Azure SQL Database, MapReduce, Hive, SQL and PySpark to solve big data type problems.
  • Strong experience in Microsoft Azure Machine Learning Studio for data import, export, data preparation, exploratory data analysis, summary statistics, feature engineering, Machine learning model development and machine learning model deployment into Server system.
  • Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS specific features.
  • Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gatheird necessary data for analysis from different sources, prepared data for data exploration using data manipulation and Teradata.
  • Well experience in Normalization and De-Normalization techniques for optimum performance in relational and dimensional database environments.
  • Developed Complex Graphical User Interactive reports using various QlikView Objects.
  • Set up Alteryx server & managed server/configuration
  • Expertise in migrating homegrown build and deploy pipelines to more modern CICD patterns
  • Experience in developing customizedUDF’sin Python to extend Hive and Pig Latin functionality.
  • Experienced in building Automation Regressing Scripts for validation of ETL process between multiple databases like Oracle, SQL Server, Hive, and Mongo DB usingPython.
  • Proficiency in SQL across several dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle)
  • Developed JSON Scripts for deploying the Pipeline in Azure Data Factory (ADF) that process the data using the SQL Activity. Build an ETL which utilizes spark jar inside which executes the business analytical model.
  • Skilled in performing data parsing, data ingestion, data manipulation, data architecture, data modelling and data preparation with methods including describe data contents, compute descriptive statistics of data, regex, split and combine, Remap, merge, subset, reindex, melt and reshape.

TECHNICAL SKILLS

Big Data Ecosystem: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, KafkaFlume, Cassandra, Impala, Oozie, Zookeeper, MapR, Amazon Web Services (AWS), EMR, IBM BPM v8.5.7

Machine Learning Classification Algorithms: Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbor (KNN), Gradient Boosting Classifier, Extreme Gradient Boosting Classifier, Support Vector Machine (SVM), Artificial Neural Networks (ANN), Naïve Bayes Classifier, Extra Trees Classifier, Stochastic Gradient Descent, etc.

Cloud Technologies: AWS, Azure, Google cloud platform (GCP)

IDE’s: IntelliJ, Eclipse, Spyder, Jupyter

Ensemble and Stacking: Averaged Ensembles, Weighted Averaging, Base Learning, Meta Learning, Majority Voting, Stacked Ensemble, AutoML - Scikit-Learn, MLjar, etc.

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server, HBASE

Programming / Query Languages: Java, SQL, Python Programming (Pandas, NumPy, SciPy, Scikit-Learn, Seaborn, Matplotlib, NLTK), NoSQL, PySpark, PySpark SQL, SAS, R Programming (Caret, Glmnet, XGBoost, rpart, ggplot2, sqldf), RStudio, PL/SQL, Linux shell scripts, Scala.

Data Engineer/Big Data Tools / Cloud / Visualization / Other Tools: Databricks, Hadoop Distributed File System (HDFS), Hive, Pig, Sqoop, MapReduce, Spring Boot, Flume, YARN, Hortonworks, Cloudera, Mahout, MLlib, Oozie, Zookeeper, etc. AWS, Azure Databricks, Azure Data Explorer, Azure HDInsight, Salesforce, NI-FI, GCP, Google Shell, Linux, Big Query, Bash Shell, Unix, Tableau, Power BI, SAS, We Intelligence, Crystal Reports, Dashboard Design.

PROFESSIONAL EXPERIENCE

Confidential, Santa Fe, NM

Data Engineer

Responsibilities:

  • Performed data analysis and developed analytic solutions. Data investigation to discover correlations / trends and the ability to explain them.
  • Worked with Data Engineers, Data Architects, to define back-end requirements for data products (aggregations, materialized views, tables - visualization)
  • Developed frameworks and processes to analyze unstructured information. Assisted in Azure Power BI architecture design
  • Experienced with machine learning algorithm such as logistic regression, random forest, XGboost, KNN, SVM, neural network, linear regression, lasso regression and k - means
  • Implemented Statistical model and Deep Learning Model (Logistic Regression, XGboost, Random Forest, SVM, RNN, and CNN).
  • Have donecertification in IBM BPM 8.0 and IBM BPM 8.5.
  • Designing and Developing Oracle PL/SQL and Shell Scripts, Data Import/Export, Data Conversions and Data Cleansing.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics
  • Performing data analysis, statistical analysis, generated reports, listings and graphs using SAS tools, SAS/Graph, SAS/SQL, SAS/Connect and SAS/Access.
  • Developing Spark applications using Scala and Spark-SQL for data extraction, transformation, and aggregation from multiple file formats. Using Kafka and integrating with the Spark Streaming. Developed data analysis tools using SQL andPythoncode.
  • Designing Data Mart and Data Warehouse and Implementation of QlikView for BI reporting Solutions.
  • Create and place alteryx server on MS Azure VM
  • Create alteryx workflows with advanced analytics
  • Tested the ETL Ab Initio mappings and other ETL Processes ( DW Testing ).
  • Authoring Python (PySpark) Scripts for custom UDF’s for Row/ Column manipulations, merges, aggregations, stacking, data labeling and for all Cleaning and conforming tasks. Migrate data from on-premises to AWS storage buckets.
  • Involved in Installation QlikView 12.0 SR5, Printing 16/17 in both publisher and server.
  • Involved in testing dashboards of Qlikview 11.2 version to migrate it to Qlikview 12.1. Extensive experience with Extraction, Transformation, Loading (ETL) process using Ascential Data Stage EE/8.0/
  • Developed a python script to transfer data, REST API’s and extract data from on-premises to AWS S3. Implemented Micro Services based Cloud Architecture using Spring Boot.
  • Worked on Ingesting data by going through cleansing and transformations and leveraging AWS Lambda, AWS Glue and StepFunctions.
  • Created yaml files for each data source and including glue table stack creation. Worked on a python script to extract data from Netezza databases and transfer it to AWS S3
  • Developed Lambda functions and assigned IAM roles to run python scripts along with various triggers (SQS, EventBridge, SNS)
  • Experienced in onboarding wide verity of technologies(java,.net, Nodejs etc) on to the CICD pipeline
  • Writing UNIX shell scripts to automate the jobs and scheduling cron jobs for job automation using commands with Crontab. Created a Lambda Deployment function, and configured it to receive events from S3 buckets
  • Built the machine learning model include: SVM, random forest, XGboost to score and identify the potential new business case with Python Scikit-learn.
  • Worked onmigrationproject fromTeamWorks 6.2 to IBM BPM 8.5.7.
  • Overseeing teh migration of teh database from staging area to Data warehouse usingETL tool (Informatica)
  • Experience in Converting existing AWS Infrastructure to Server less architecture(AWS Lambda, Kinesis),deploying viaTerraformand AWS Cloud Formation templates.
  • Worked onDocker containerssnapshots, attaching to a running container, removing images, managing Directory structures and managing containers.
  • Involved in Optimizing existing QlikView dashboards with a focus on usability, performance, long-term, flexibility, and standardization.
  • Experienced in day - to-day DBA activities includingschema management, user management(creating users, synonyms, privileges, roles, quotas, tables, indexes, sequence),space management(table space, rollback segment),monitoring(alert log, memory, disk I/O, CPU, database connectivity),scheduling jobs, UNIX Shell Scripting.
  • Developed normalizedLogicalandPhysicaldatabase models to design OLTP system for insurance applications.
  • Coordinated with different data providers to source the data and build the Extraction, Transformation, and Loading (ETL) modules based on the requirements to load the data from source to stage and performed Source Data Analysis.
  • Analyzed existing Data Model and accommodated changes according to the business requirements.
  • Decommission 2 data marts and expanded the data model of an existing Oracle DW used as their Data Mining and Metrics repository.
  • Create analytical application in alteryx designer and store on alteryx server for non - technical user
  • Tested ETL with Front office data as source and tables in the settlement databases warehouse as target.
  • Created dimensional model for the reporting system by identifying required dimensions and facts usingErwin
  • Developed complexTalend ETL jobsto migrate the data fromflat filesto database. Pulled files frommainframe into Talendexecution server using multipleftpcomponents.
  • Developed complexTalend ETL jobstomigratethe data from flat files to database. DevelopedTalend ESBservices and deployed them onESBservers on different instances.
  • Developed merge scripts toUPSERTdata intoSnowflakefrom an ETL source.

Environment: Hadoop, Map Reduce, HDFS, Hive, Ni-fi, Spring Boot, Cassandra, Swamp, Data Lake, Sqoop, Oozie, SQL, Kafka, Spark, Scala, Java, AWS, GitHub, Talend Big Data Integration, Solr, Impala.

Confidential, Chicago, Illinois

Data Engineer

Responsibilities:

  • Transforming business problems into Big Data solutions and define Big Data strategy and Roadmap. Installing, configuring, and maintaining Data Pipelines
  • Responsible for creating CICD patterns for deploying various MQ objects as code
  • Responsible for creating CICD pattern for DB object migration via CICD
  • Developed thefeatures,scenarios,step definitionsforBDD (Behavior Driven Development)andTDD (Test Driven Development)usingCucumber, Gherkinandruby.
  • Designing the business requirement collection approach based on the project scope and SDLC methodology.
  • Creating Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write-back tool and backwards.
  • Files extracted from Hadoop.
  • Create ETL between different data warehousing such as snowflake & Redshift via Alteryx workflow
  • Experience in deploying the Spring Boot Microservices to Pivotal Cloud Foundry (PCF) using build pack and Jenkins for continuous integration, Deployments in Pivotal Cloud Foundry (PCF) and binding of Services in Cloud and Installed Pivotal Cloud Foundry (PCF) on Azure to manage the containers created by PCF.
  • Analyzed clickstream data from Google analytics with Big Query. Designed APIs to load data from Omniture, Google Analytics, and Google Big Query.
  • Maintained JIRA team and program management review dashboards and maintained COP account and JIRA team sprint metrics reportable to customer and SAIC division management, and COP User Metrics.
  • Experience managing Azure Data Lakes (ADLS) and Data Lake Analytics and an understanding of how to integrate with other Azure Services. Knowledge of USQL
  • Created functions and assigned roles inAWS Lambdato run python scripts, andAWS Lambdausing java to perform event driven processing. Created Lambda jobs and configured Roles usingAWS CLI.
  • Data visualization:Pentaho, Tableau, D3. Have knowledge of Numerical optimization, Anomaly Detection and estimation, A/B testing, Statistics, and Maple. Have big data analysis technique using Big data related techniques i.e.,Hadoop, MapReduce, NoSQL, Pig/Hive, Spark/Shark, MLlibandScala, numpy, scipy, Pandas, scikit-learn.
  • UtilizedSpark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, and Pythonand utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
  • Used ApacheSpark Data frames, Spark-SQL, Spark MLLibextensively and developing and designing POC's using Scala, Spark SQL and MLlib libraries.
  • Set up Alteryx server & managed server/configurations
  • Data Integrationingests, transforms, and integrates structured data and delivers data to a scalable data warehouse platform using traditional ETL (Extract, Transform, and Load) tools and methodologies to collect of data from various sources into a single data warehouse.
  • Applied variousmachine learning algorithmsand statistical modeling likedecision trees, text analytics, natural language processing (NLP),supervised and unsupervised, regression models, social network analysis, neural networks, deep learning, SVM, clusteringto identify Volume usingscikit-learn packageinpython, R, and Matlab. Collaborate withData Engineers and Software Developersto develop experiments and deploy solutions to production.
  • Create and publish multiple dashboards and reports usingTableau server and work onText Analytics, Naive Bayes, Sentiment analysis, creating word cloudsand retrieving data fromTwitterand othersocial networking platforms.
  • Work on data that was a combination of unstructured and structured data from multiple sources and automate the cleaning usingPython scripts.
  • Actively participated in the design, architecture and development of user interface objects in QlikView applications.
  • Created User manual on using Atlassian Products (Jira/Confluence) and trained end users project wise.
  • Implemented the Atlassian Stash application as the SCM tool of choice for central repository management
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Used SQL Server Integrations Services (SSIS) for extraction, transformation, and loading data into target system from multiple sources
  • Involved inUnit Testingthe code and provided the feedback to the developers. PerformedUnit Testingof the application by usingNUnit.
  • DevelopedData Mapping, Data Governance, TransformationandCleansingrules for the Master Data Management Architecture involving OLTP, ODS and OLAP
  • Migrated Database from SQL Databases (Oracle and SQL Server) to NO SQL Databases (Cassandra/MONGODB);
  • Studied the existing OLTP systems (3NF models) and created facts and dimensions in the data mart.Worked with different cloud - based data warehouse like SQL, Redshift.
  • Write research reports describing the experiment conducted, results, and findings and make strategic recommendations to technology, product, and senior management. Worked closely with regulatory delivery leads to ensure robustness in prop trading control frameworks using Hadoop, Python Jupiter Notebook, Hive and NoSql.

Environment: Hadoop, Kafka, Spark, Sqoop, Docker, Swamp, Big Query, Spark SQL, TDD, Spark-Streaming, Hive, Scala, pig, NoSQL, Impala, Oozie, Hbase, Data Lake, Zookeeper.

Confidential, Tampa, FL

Data Scientist/ R Programmer

Responsibilities:

  • Gatheird business requirements, definition and design of the data sourcing, worked with the data warehouse architect on the development of logical data models.
  • Automated Diagnosis of Blood Loss during Emergencies and developed Machine Learning algorithm to diagnose blood loss.
  • Extensively used Agile methodology as the Organization Standard to implement the data Models. Used Micro service architecture with Spring Boot based services interacting through a combination of REST and Apache Kafka message brokers.
  • Created several types of data visualizations using Python and Tableau. Extracted Mega Data from AWS using SQL Queries to create reports.
  • Involved in creating teh front-end for ETL validation tool by Django framework.
  • Performed reverse engineering using Erwin to redefine entities, attributes, and relationships existing database.
  • Analyzed functional and non-functional business requirements and translate into technical data requirements and create or update existing logical and physical data models. Developed a data pipeline using Kafka to store data into HDFS.
  • Performed Regression testing for Gloden Test Cases from State (end to end test cases) and automated the process usingpythonscripts.
  • Administer Alteryx Gallery Platforms & Data Connections
  • Alteryx Systems Settings, ODBC & DSN connections to different data sources
  • DevelopedSparkjobs using Scala for faster real-time analytics and usedSparkSQL for querying
  • Generated graphs and reports using gplot package in RStudio for analytical models. Developed and implemented R and Shiny application which showcases machine learning for business forecasting.
  • Developed predictive models using Decision Tree, Random Forest, and Naïve Bayes.
  • Used pandas, NumPy, seaborne, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various machine learning algorithms. Expertise inR, Matlab, pythonand respective libraries.
  • Involved in creating ETL validation script in Python.
  • Research on Reinforcement Learning and control (TensorFlow, Torch), andmachinelearning model (Scikit-learn).
  • Hands on experience in implementing Naive Bayes and skilled inRandom Forests, Decision Trees, Linear,and Logistic Regression, SVM, Clustering, TEMPPrincipal Component Analysis.
  • Performed K-means clustering, Regression andDecision Treesin R. Worked on data cleaning and reshaping, generated segmented subsets using NumPy and Pandas in Python.
  • Implemented various statistical techniques to manipulatethe datalike missingdataimputation, TEMPprincipal component analysis and sampling.
  • Worked on R packages to interface with Caffe Deep Learning Framework. Perform validation on machine learning output from R.
  • Applied different dimensionality reduction techniques like TEMPprincipal component analysis (PCA) and t-stochastic neighborhood embedding (t-SNE) on feature matrix.
  • Responsible for design and development of Python programs/scripts to prepare transform and harmonize data sets in preparation for modeling.

Environment: Spark, YARN, HIVE, Pig, Scala, Mahout, NiFi, TDD, Python, Spring Boot, Hadoop, Azure, Dynamo DB, Kibana, NOSQL, Sqoop, MYSQL.

Confidential, Rochester, MN

Java Developer

Responsibilities:

  • Developed front-end screens using JSP, HTML, CSS, JavaScript, JSON.
  • Developed SCM by using the JSP/HTML like one form for each functionality user interface, standard validations using the JavaScript, Servlets used as the controllers for the business logic and business logic using JDBC, XML parsing techniques etc. using MVC.
  • Developed Single Sign On (SSO) functionality, through which we can run SCM from Oracle Applications.
  • Developed Server-Side components for the business services for creating Items, BOM, Sourcing Rules, and substitute.
  • Involved in developing the Routings and configured Routing Program as scheduled the concurrent request.
  • Involved in raising the Notifications to Oracle Users through Mailing Concept for intimating to start the next process using workflow.
  • Overseeing teh migration of teh database from staging area to Data warehouse usingETL tool (Informatica).
  • Extensively worked on creating the setups for Organizations, Templates, Concurrent Requests, Cross Reference Types, User Creations, assigning responsibilities, creating value sets, Descriptive Flex Fields etc., in Oracle Applications.
  • Used CVS as version control system.
  • Implemented Struts MVC design pattern and front controller pattern along with Action Servlet as front controller for dis application.

Environment: Java, JDBC, Servlets, Oracle, JSP, XML, UML, HTML, CSS, JavaScript, JSON, UNIX, CVS, DB2 and Ionic Framework, Struts MVC, Action Servlet.

We'd love your feedback!