We provide IT Staff Augmentation Services!

Data Engineer Resume

2.00/5 (Submit Your Rating)

Phoenix, ArizonA

SUMMARY

  • Over 5+ years of experience in DataAnalysis, DecisionTrees, RandomForest,DataProfiling, DataIntegration, Datagovernance, Migration and MetadataManagement, MasterDataManagement and Configuration Management.
  • Experience in various phases of the SoftwareDevelopmentlifecycle (Analysis, Requirementsgathering, Designing) with expertise in documenting various requirement specifications, functional specifications, DataValidation, TestPlans, Source to Targetmappings, SQLJoins, DataCleansing.
  • Extensive experience in TextAnalytics, developing different StatisticalMachineLearning, DataMiningsolutions to various business problems and generating data visualizations using R, Python.
  • Had executive experience and performing IT roles for various industry leaders. Acquired a deep range of skills in:
  • Machinelearningalgorithms, Statisticalanalysis.
  • Proficient in data mining tools like R, SAS, Python, SQL, Excel, BigDataHadoop eco­systems Staff leadership and development
  • Proficient in advising on the use of data for compiling personnel and statistical reports and preparing personnel action documents
  • Patterns within data, analyzing data and interpreting results
  • Strong ability to analyze sets of data for signals, patterns, ways to group data to answer questions and solve complex data puzzles
  • Skilledin AdvancedRegressionModeling,TimeSeriesAnalysis, StatisticalTesting, Correlation, MultivariateAnalysis, Forecasting, ModelBuilding, BusinessIntelligence tools and application of StatisticalConcepts
  • Proficient in data entry, data auditing, creating data reports & monitoring data for accuracy
  • Ability to extract Web search and data collection, Web data mining, Extract database from website, Extract Data entry and Data processing.
  • Possess functional knowledge in the areas of business process study, requirement capture, analysis, project documentation and training
  • Highly competent at researching, visualizing and analyzing raw data to identify recommendations for meeting organizational challenges.
  • Experience in importing and exporting data using Sqoop from HDFS to RDBMS and viceversa
  • Experience with DataAnalytics, DataReporting, Ad­hocReporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Expert in data flow between primary DB and various reporting tools, Expert in finding Trends and Patterns within Datasets and providing recommendations accordingly.
  • Ability to use dimensionality reduction techniques and regularization techniques.
  • Expertise in DataReporting, Ad­hocReporting, Graphs, Scales, PivotTables and OLAP reporting.
  • Proficient in data mining tools like R, SAS, Python, SQL Alchemy , Excel, BigDataHadoop eco­systems.
  • Independently handle Hadoop administration in local and on cloud in Linux environment.
  • Extracting and modeling datasets from verity of data sources like Hadoop (using Pig, Hive, Spark), Teradata and Snowflakes for ad­hoc analysis and have fair understanding of AGILE methodology and practice.
  • Working knowledge on Applicationdesign, architecture and development.
  • Experienced in complete SDLC and STLC with end - user interaction for functional specification, system analysis, and unit regression testing; participated in system integration testing
  • Experienced in working in a team environment to deliver on demand service; ability to deliver appropriate quality solutions under pressure; pro-active and strong analytical problem-solving skills
  • Participated in portfolio meetings; experienced in preparing hi-level design documents, low-level design documents, and detailed technical design documents using case scenarios.

TECHNICAL SKILLS

Big Data Technologies: Hadoop, Hive, MapReduce

Machine Learning: Regression, Polynomial Regression, Random Forest, Logistic Regression, Decision Trees, Classification, Clustering, Association, Simple/Multiple linear, Kernel SVM, K-Nearest Neighbours (K-NN).

OLAP/ BI / ETL Tool: Business Objects 6.1/XI, MS SQL Server 2008/2005 Analysis Services (MS OLAP, SSAS), Integration Services (SSIS), Reporting Services (SSRS), Performance Point Server (PPS), Oracle 9i OLAP, MS Office Web Components (OWC11), DTS, MDX, Crystal Reports 10, Crystal Enterprise 10(CMC)

BI Tools: Tableau, Tableau server, Tableau Reader, SAP Business Objects, OBIEE, QlikView, SAP Business Intelligence, Amazon Redshift, or Azure Data Warehouse

Packages: ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, C50, twitter, NLP, Reshape2, rjson, plyr, pandas, numPy, seaborn, sciPy, matplot lib, scikit-learn, Beautiful Soup, Rpy2, sqlalchemy.

Web Technologies: JDBC, HTML5, DHTML and XML, CSS3, Web Services, WSDL

Tools: Erwin r 9.6, 9.5, 9.1, 8.x, Rational Rose, ER/Studio, MS Visio, SAP Power designer.

Languages: Java 8, Python, R

Databases: SQL, Hive, Impala, Pig, Spark SQL, Databases SQL-Server, MySQL, MS Access, HDFS, SQL Alchemy

Reporting Tools: MS Office (Word/Excel/Power Point/ Visio), Tableau, Crystal reports XI, Business Intelligence, SSRS, Business Objects 5.x/ 6.x, Cognos7.0/6.0.

Version Control Tools: SVM, GitHub.

PROFESSIONAL EXPERIENCE

Confidential, Phoenix, Arizona

Data Engineer

Responsibilities:

  • Understand and analyze business data requirements and architect an accurate, extensible, flexible and logical data model and Defining and implementing conceptual, logical, and physical data modeling concepts.
  • Defining Data Sources and data models, documenting actual data flows, data exchanges, and systems interconnections and interfaces. Ensuring these are aligned with the enterprise data model.
  • Develop and optimize ETL processes by working closely with multiple data partners and stakeholders across the company to meet growing business needs.
  • Design and build world-class high-volume real-time data ingestion frameworks and automate various data sources into Bigdata technologies like Hadoop etc.
  • Performed Data mapping between source systems to Target systems, logical data modeling, created class diagrams and diagrams and used SQLqueries to filter data.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Identify and predict technology needs related to data and reporting for the organization, and propose technology solutions.
  • Handing important of data from various data sources, performed transformations using HIVE. (External tables, partitioning)
  • Used Python and Spark to scrape, clean and analyze large datasets
  • Worked with a team to create machine learning models for customer segmentation using kmeans
  • Developed an algorithm in python and Pyspark that automated in machine learning studio to predict the likelihood of a AmericanExpress customers booking a flight every month
  • Involved in running Map Reduce jobs for processing millions of records.
  • Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files. interacting with other data scientists and architects, custom solutions for data visualization using tools like a Dataiku, SAS viya
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Identify and predict technology needs related to data and reporting for the organization, and propose technology solutions.
  • Handing important of data from various data sources, performed transformations using HIVE.(External tables, partitioning)
  • Used Python and Spark to scrape, clean and analyze large datasets
  • Created machine learning models with Python and SciKit-learn to predict anomaly in Real-Time.
  • Time - series forecasting for customer historical data
  • Developed an algorithm in python and Pyspark that automated anomaly detection in real time
  • Applying K-means and DBSCAN clustering algorithm to develop models
  • Involved in running Map Reduce jobs for processing millions of records.
  • Developed in Python programs for manipulating the data reading from various Teradata and convert them as one CSV Files.
  • Identified issue and developed a procedure for correcting the problem which resulted in the improved quality of critical tables by eliminating the possibility of entering duplicate data in a Data Warehouse. interacting with other data scientists and architects, custom solutions for data visualization using tools like a python
  • Developed MapReduce Python modules for machine learning & predictive analytics in Hadoop
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Identify and predict technology needs related to data and reporting for the organization, and propose technology solutions.
  • Develop / Auto deploy content using AWS (Amazon Web Services), GIT/Bitbucket, Maven, Jenkins
  • Develop integration solutions between AEM, AWS (Lambda, S3, API Gateway and Cloud Formation) and Spredfast (Social) Platforms
  • Worked on AWS Elastic Beanstalk to deploy, monitor, and scale an application

Environment: Spark,MS-Office, Teradata, AWS, ML Studio, XML, Hive, HDFS, Flume, Python, R, Tableau 9.2

Confidential, Salt Lake City, UT

Data Engineer

Responsibilities:

  • Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like XG Boost, SVM, and Random Forest.
  • A highly immersive Data Science program involving Data Manipulation & Visualization, Web Scraping, Machine Learning, Python programming, SQLAlchemy
  • , GIT, Unix Commands, NoSQL, Mongo DB, Hadoop.
  • Setup storage and Data Analysis tools in Amazon Web Services cloud computing infrastructure.
  • Used pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, NLTK in Python for developing various machine learning algorithms.
  • Installed and used Caffe Deep Learning Framework
  • Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
  • Worked as Data Architects and IT Architects to understand the movement of data and its storage and ERStudio9.7
  • Participated in all phases of data mining; data collection, data cleaning, developing models, validation, visualization and performed Gap analysis.
  • Data Manipulation and Aggregation from different source using Nexus, Toad, Business Objects, Power BI and Smart View.
  • Focus on integration overlap and Informatica newer commitment to MDM with the acquisition of Identity Systems.
  • Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Secondary Name Node, and Map Reduce concepts.
  • Experienced in Created SSIS packages to automate the process of extracting data and load it to MS Dynamics CRM.
  • Querying SQL Alchemy database for customer production issue resolutions.
  • Developed dynamic reports using CRM reporting interface and SSRS deployed it on the CRM that were used in the application depending on roles.
  • Used Teradata15 utilities such as Fast Export, MLOAD for handling various tasks data migration/ETL from OLTP Source Systems to OLAP Target Systems
  • Experience in Hadoop ecosystem components like Hadoop Map Reduce, HDFS, HBase, Oozie, Hive, Sqoop, Pig, Flume including their installation and configuration.
  • Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we would be able to assign each document a response label for further classification.
  • Data transformation from various resources, data organization, features extraction from raw and stored.
  • Extracted data from HDFS and prepared data for exploratory analysis using data munging.
  • Configured and installed Informatica MDM Hub server, cleanse, resource kit, and Address Doctor.
  • Involved with Master Data Management (MDM) for Customer Data Integration using Siperiantool.
  • Handled Mater Data Management (MDM) Hub Console configurations like Stage Process configuration, Load Process Configuration, Hierarchy Manager Configuration.
  • Implemented to reprocess the failure messages in Kafka using offset id.
  • Implemented Kafka producer and consumer applications on Kafka cluster setup with help of Zookeeper.
  • Used Spring Kafka API calls to process the messages smoothly on Kafka Cluster setup.

Environment: SQL/Server, Oracle 10g/11g, MS-Office, Teradata, Informatica, ER Studio, XML, Hive, HDFS, Flume, Sqooq, R connector, SQL Alchemy, Python, R, Tableau 9.2

Confidential, Lakewood, Colorado

Data Analyst

Responsibilities:

  • Understand and analyze business data requirements and architect an accurate, extensible, flexible and logical data model and Defining and implementing conceptual, logical, and physical data modeling concepts.
  • Defining Data Sources and data models, documenting actual data flows, data exchanges, and systems interconnections and interfaces. Ensuring these are aligned with the enterprise data model.
  • Develop and optimize ETL processes by working closely with multiple data partners and stakeholders across the company to meet growing business needs.
  • Understand and maintain existing schema with industry standard change management process to achieve zero downtime schema upgrade.
  • Design and build world-class high-volume real-time data ingestion frameworks and automate various data sources into Bigdata technologies like Hadoop etc.
  • Performed Data mapping between source systems to Target systems, logical data modeling, created class diagrams and diagrams and used SQLqueries to filter data.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Identify and predict technology needs related to data and reporting for the organization, and propose technology solutions.
  • Develop and keep current, a high-level data strategy that fits with the Data Warehouse Standards and the overall strategy of the Company.
  • Designed different type of STAR schemas using ERWIN with various Dimensions like time, services, customersandfacttables.
  • Analyze database infrastructure to ensure compliance with customer security standards, database performance considerations, and reverse engineering of existing database environments.
  • Used Hive and created Hive tables and involved in dataloading and writing HiveUDFs.
  • Creation of BTEQ, Fast export, MultiLoad, TPump, Fast load scripts for extracting data from various production systems.
  • Creation of database objects like tables, views, Materialized views, procedures, packages using Oracle tools like PL/SQL, SQL* Plus, SQL*Loader and Handled Exceptions.
  • Worked on Hive for exposing data for further analysis and for generating transforming files from different analytical formats to text files.
  • Extensively used Erwin for developing data model using star schema methodologies.
  • Worked on importing and exporting data from Oracle and DB2 into HDFS using Sqoop
  • Created, optimized, reviewed and executed Teradata SQL test queries to validate transformation rules used inthesource to target mappings/source views, and to verify data in target tables.
  • Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
  • Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.

Environment: ERWIN 9.x, Informatica Power Mart (Source Analyzer, Data warehousing designer, Mapping Designer, Transformations), MS SQL Server, Oracle, SQL, Hive, Map Reduce, PIG, Sqoop, HDFS, Hadoop, Teradata, Netezza, PL/SQL, Informatica, SSIS, SSRS.

Confidential

Data Modeler/Data Analyst

Responsibilities:

  • Worked with business users to gather requirements and createadata flow, process flows,and functional specification documents.
  • Developed Data Mapping, Data Governance and transformation and cleansing rules for the Master Data Management Architecture involving OLTP, ODS.
  • Based on client requirement, creating design documents for workday reporting and createdadashboard which gives all the information regarding those reports.
  • Developed, enhanced and maintained Snow Flakes Schemas within data warehouse and data mart with conceptual data models.
  • Designed 3rd normal form target data model and mapped to logical model.
  • Involved in extensive Data validation using SQLqueries and back-end testing and used SQL for Querying the database in UNIX environment
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats
  • Involved in dataanalysis and creating datamapping documents to capture source to target transformation rules.
  • Used ER Studio and Visio to create 3NF and dimensional data models and published to the business users and ETL / teams.
  • Involved in Datamapping specifications to create and execute detailed system test plans. The Datamapping specifies what data will be extracted from an internal data warehouse, transformed and sent to an external entity.
  • Developed Informatica SCD type-I, Type-II and Type III mappings and tuned them for better performance. Extensively used almost all of the transformations of Informatica including complex lookups, Stored Procedures, Update Strategy, mapplets,and others.
  • Creating or modifying the T-SQL queries as per the business requirements and worked on creating role playing dimensions, factless fact, snowflake,andstarschemas.
  • Using ER Studio modeling tool, publishing of a data dictionary, review of the model and dictionary with subject matter experts and generation of data definition language.
  • Extracted data from databases Oracle, Teradata, Netezza, SQL server and DB2 using Informatica to load it into a single repository for Data analysis.
  • Involved in development and implementation of SSIS, SSRS and SSAS application solutions for various business units across the organization.
  • Managed full SDLC processes involving requirements management, workflow analysis, source data analysis, data mapping, metadata management, data quality, testing strategy and maintenance of the model.
  • Created custom Workday reports and modify/troubleshoot existing custom reports.
  • Created and modified several UNIX shell Scripts according to the changing needs of the project and client requirements.
  • Identified and tracked the slowly changing dimensions, heterogeneous sources and determined the hierarchies in dimensions.
  • The building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
  • Wrote reports using Report Writer that extract Workday data and manipulate it in other formats (Excel) for various needs.
  • Used Teradata utilities such as Fast Export, MLOAD for handling various tasks.
  • Analysis of functional and non-functional categorized data elements for dataprofiling and mapping from source to target data environment. Developed working documents to support findings and assign specific tasks.
  • Translated business requirements to technical requirements in terms of BO(Business Objects) universe and report design.
  • Involved in fixing invalid mappings, testing of Stored Procedures and Functions, Unit and Integrating testing of InformaticaSessions, Batches,and the TargetData.
  • Involved in the validation of the OLAPUnittesting and SystemTesting of the OLAPReport Functionality and data displayed in the reports.

Environment: ER Studio, Informatica Power Center 8.1/9.1, Power Connect/ Power exchange, Oracle 11g, Mainframes,DB2 MS SQL Server 2008, SQL,PL/SQL, XML, Windows NT 4.0, Tableau, Workday, SPSS, SAS, Business Objects, XML, Tableau, Unix Shell Scripting, Teradata, Netezza, Aginity.

We'd love your feedback!