We provide IT Staff Augmentation Services!

Data Engineer Resume

3.00/5 (Submit Your Rating)

Chattanooga, TN

SUMMARY

  • Over 7 years of experience in IT industry on analytical programming using Python, R programming, Django, Flask, Database design and agile methodologies.
  • Experienced with full software development life cycle (SDLC), architecting scalable Platforms, Object - oriented programming (OOP), database design and agile methodologies.
  • Excellent knowledge in Agile Methodologies, Scrum stories and sprints experience in aPythonbased environment, along withdataanalytics,datawrangling.
  • Experienced in Object-oriented programming and Class oriented programing, multi-threading, algorithms, data structures and system programming using python.
  • Adept and deep understanding of Statistical modelling, Multivariate Analysis, model testing, problem analysis, model comparison, and validation.
  • Experience in Data Analysis, Data Profiling, Data Integration, Migration, Data governance and Metadata Management, Master Data Management and Configuration Management.
  • Experience with Statistics, Data Analysis, Machine Learning using Python and R language.
  • Experience in SQL programming and creation of relational database models.
  • Hands on experience onRpackages and libraries likeggplot2, Shiny, h2o, dplyr, reshape2, plotly, R Markdown, Elm Stat Learn, ca Tools etc.
  • Proficient inTableauandR-Shinydata visualization tools to analyze and obtain insights into large datasets, create visually powerful and actionable interactive reports and dashboards.
  • Hands-on experience with Python and R libraries for Data Validation, Predictive modeling, Descriptive Analysis, and Data Visualization tools.
  • Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, PIG, HIVE, HBASE, Apache Crunch, ZOOKEEPER, SCIOOP, Hue, Scala. Solr, Git, Maven, AVRO, JSON and CHEF.
  • Working knowledge of container systems such as setting up virtual machines and Docker instances
  • Experience in using various packages in R and Python like ggplot2, caret, dplyr, Rweka, gmodels, RCurl, tm, twitteR, NLP, Reshape2, rjson, plyr, pandas, numpy, seaborn, scipy, matplotlib, scikit-learn, Beautiful Soup, and Rpy2.
  • Experience in using Cloud Computing Services like Amazon Web Services (AWS), and Microsoft Azure.
  • Experience withDataAnalytics,DataReporting, Ad-hoc Reporting, Graphs, Scales, Pivot Tables andOLAPreporting.
  • Experience in creating cutting edge data processing algorithms to meet project demands.
  • Experience with Forecasting, Time Series Analysis, statistical modelling, etc
  • Experience in Data collection, Data Extraction, Data Cleaning, Data Aggregation, Data Mining, Data verification, Data analysis, Reporting, and data warehousing environments.
  • Hands on experience and in provisioning virtual clusters under Amazon Web Service (AWS) cloud, which includes services like Elastic cloud compute (EC2), S3, and EMR.
  • Experienced in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
  • Professional experience buildingdatapipelines using cloud computing platforms (Azure, AWS, and GCP)
  • Experience in object-oriented programming (OOP) concepts using Python.
  • Experience in designing the performance review dashboards using Tableau and Power BI at the organization level for weekly, monthly, quarterly, and yearly reviews.
  • Hands on experience in setting up workflow using Apache Airflow and Oozie workflow engine for managing and scheduling Hadoop jobs.
  • Hands on experience in Extract, Load and Transformation (ETL) process using Informatica Power Center/ Data Stage and SSIS.
  • Experience and Technical proficiency in Designing, Data Modeling Online Applications, Solution Lead for Architecting Data Warehouse/Business Intelligence Applications.
  • Good knowledge in Hypothesis Testing, T-Test, Z Test, Gradient descent, Newton’s Method, ANOVA test, Chi-square test. Python Libraries: Numpy, Pandas, Matplotlib, Scikit-learn, NLTK, plotly, Seaborn, Scikit-Image, and OpenCV Tools.
  • Expertise in designing Conceptual, Logical and Physical Data Models for various environments and business processes.
  • Extensive experience in designing Normalized Data Model by business process by use cases.
  • Performed Data mapping between source systems to local systems, logical data modeling, created class diagrams and ER diagrams and used SQL queries to filter data.
  • Expertise in usage of Django Framework for developing web applications.
  • Excellent experienced on NoSOL databases like MongoDB, Cassandra and write Apache Spark streaming API on Big Data distribution in the active duster environment.
  • Good knowledge in establishing database connections for Python by configuring packages MySQL- Python.
  • Extensive knowledge in various reporting objects like Facts, Attributes, Hierarchies, Transformations, filters, prompts, Calculated fields, Sets, Groups, Parameters etc., in Tableau.
  • Proficient Star Schema dimensional modeling, which includes understanding business processes, modeling dimensions, hierarchies, and facts.
  • Strong experience with big data processing using Hadoop technologiesMap Reduce, Apache Spark, Apache Crunch, Hive, Pig and Yarn.
  • Expertise in broad range of technologies, including business process tools such as Microsoft Project, Primavera, MS Access, MS Visio, technical assessment tools, Data Warehousing concepts and web design and development.
  • Good experience of software development in Python (libraries used: libraries- Beautiful Soup, PySpark, Numpy, Scipy, Matplotlib, asyncio, python-twitter, Pandas data frame, network, urllib2, MySQL for database connectivity) and IDEs -sublime text, Spyder, pycharm, pytest.
  • Experience in using Design Patterns such as MVC, Singleton and frameworks such as DJANGO and Flask.
  • Experienced in developing Web Services with Python programming language.
  • Proficient in design and development of various dashboards, reports utilizing Tableau Visualizations like bar graphs, scatter plots, pie-charts, geographic visualization and other making use of actions, other local and global filters according to the end user requirement.
  • In-depth experience in AWS using EC2, Volume and Snapshot management, AWS Dynamo DB, AWS S3, AWS RDS, AWS VPC, Route 53, Elastic Beanstalk and IAM services.
  • Experience in writing Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on MySQL, Cassandra, and MongoDB database using SQL and PL/SQL.
  • Experienced in developing Custom Report and different types of Chat Reports, Tabular Reports, Matrix Reports and distributed reports in multiple formats and through Tableau.
  • Ability to learn and adapt quickly to the emerging new technologies and paradigms.
  • Good analytical and problem-solving skills and ability to work on own besides being a valuable and contributing team player.
  • Excellent Interpersonal and communication skills, efficient time management and organization skills, ability to handle multiple tasks and work well in a team environment.

TECHNICAL SKILLS

Software Methodologies: SDLC - Waterfall, Agile, SCRUM

Programming Languages: Python, SAS, R

Operating Systems: Linux, Windows

Python Libraries: Pandas, NumPy, Matplotlib, Scikit-learn

Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Spark, Sqoop

Data Visualization/ BI/ Reporting Tools: Tableau, Power BI, Google Analytics, SSRS, Crystal reports

ETL Tools: SSIS, Informatica, DataStage, Talend

Databases: Oracle 12c/11g/10g, SQL Server, DB2, R-Base, Cassandra

Database Tools: SQL* Plus, T-SQL, SSAS, SSIS, SQL* Loader, Pl/SQL, PostgreSQL

Data Modelling: MS Visio, Erwin

Cloud: AWS, MS Azure, Docker

PROFESSIONAL EXPERIENCE

Confidential, Chattanooga, TN

Data Engineer

Responsibilities:

  • Created and run advanced SQL queries to pinpoint client's data issue.
  • Created issue tracking system using BI dashboard.
  • Documented business rules and implemented data transformations using Pandas.
  • Analysis of ETL Mappings based on Facts & Dimensions from Source to target tables for directs moves and indirect moves based on transformation rules & lookup tables.
  • Building, publishing customized interactive reports and dashboards, report scheduling using Tableau Desktop.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Analyzed the business requirements for web applications and SharePoint application.
  • Utilized Apache Spark with Python to develop and execute Big Data Analytics and Machine learning applications, executed machine Learning use cases under Spark ML and Mllib.
  • Developed and deployed various Lambda functions in AWS with in-built AWS Lambda Libraries and also deployed Lambda Functions in Scala with custom Libraries.
  • Automated resulting scripts and workflow usingApache Airflowandshell scriptingto ensure daily execution in production.
  • Actively involved in writing T-SQL Programming for implementing Stored Procedures and Functions and cursors, views for different tasks.
  • UsedApacheFalconformirroringofHDFSandHIVEdata.
  • Participate in the design, build and deployment of NoSQL implementations like MongoDB.
  • Extensively used ApacheKafka, ApacheSpark,HDFSand ApacheImpalato build a near real time data pipelines that get, transform, store and analyze click stream data to provide a better personalized user experience.
  • Worked on Big data on AWS cloud services i.e., EC2, S3, EMR and DynamoDB.
  • Performed analysis of implementing Spark using Scala and wrote spark sample programs using PySpark
  • Analyzed sales of past few years using data mining tools likeR Studio, Python.
  • Developed SSRS reports, SSIS packages to Extract, Transform and Load data from various source systems.
  • Worked on ETL testing and used SSIS tester automated tool for unit and integration testing.
  • Responsible for building-scalable distribution data solution using Hadoop.
  • Developed AWS Lambda to invoke glue job as soon as a new file is available in Inbound S3 bucket.
  • Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using Cloudwatch.
  • Design and implement securedata pipelinesinto aSnowflakedata warehouse from on-premise and cloud data sources
  • Creation of best practices and standards for data pipelining and integration withSnowflakedata warehouses.
  • Functioned as SME and managed new system that streamlined customer information process and increased productivity.
  • Involved with all the phases of Software Development Life Cycle (SDLC) methodologies throughout the project life cycle.
  • Worked extensively with Dimensional modeling, Data migration, Data cleansing, ETL Processes for data warehouses.
  • Developed data Mart for the base data in Star Schema, Snow-Flake Schema.
  • Used Hive, Impala and HBase as part of Cloudera Hadoop/big data testing.
  • Used Web services (SOAP) for transmission of large blocks ofXMLdata overHTTP.
  • Developed Test Cases for Report, SharePoint, and Web applications for UAT environment.
  • Extensively worked on to develop test plans and test strategies for ETL, Report, and SharePoint and Web application projects.
  • Created Use Case Diagrams, Activity Diagrams, Sequence Diagrams and ER Diagrams in MS Visio.
  • Involved in writing T-SQL working on SSIS, SSAS, Data Cleansing, Data Scrubbing and Data Migration.
  • Worked on claims data and extracted data from various sources such as flat files, Oracle, and Mainframes.
  • Designed and generated various dashboards, reports using various Tableau Visualizations.

Environment: SDLC- Agile/Scrum, Python, VBA, R-Studio, Hive, Pandas, NumPy, T-SQL, Visio, Matplotlib, Seaborn, SQL, R-Base, UNIX, Linux, Windows, Git, Scala, AWS Lambda, Kinesis(streams), Oracle, NoSQL, PostgreSQL, DataStage, PySpark, Power BI, Tableau, SQL Server, OLAP, SharePoint, OLTP, ETL, SQL Query.

Confidential, Waterbury, CT

Data Engineer

Responsibilities:

  • Worked with applications like R and Python to develop neural network algorithms, cluster analysis.
  • Conducted a range of statistical analyses to provide valuable data-driven insights for business decision making.
  • Worked with packages like ggplot2 and shiny in R to understand data and developing applications.
  • Prepared ETL technical Mapping Documents along with test cases for each Mapping for future developments to maintain SDLC and Migration process.
  • Have actively taken part in Data Profiling, Data Cleansing, Data Migration, Data Mapping and actively helped ETL developers to Compare data with original source documents and validate Data accuracy.
  • Worked on Tableau, to create dashboards and visualizations.
  • Designed and developed the framework to consume the web services hosted inAmazon EC2instances.
  • Analyze customer data in Python and R to track correlations in customer behavior, define user segments to implement process and product improvements.
  • Encoded and decoded json objects using PySpark to create and modify the dataframes in Apache Spark.
  • Real time streaming of data using Spark with Kafka.
  • Install and configureApache Airflowfor S3 bucket and Snowflake data warehouse and createddagsto run the Airflow.
  • Optimized the performance of queries with modifications in T-SQL queries, removed unnecessary columns, eliminated redundant and inconsistent data, normalized tables, established joins and created indexes whenever necessary.
  • Extensively used Apache Sqoop for efficiently transferring bulk data between Apache Hadoop and relational databases (Oracle, MySQL) for predictive analytics
  • Implemented AWS lambda functions, python script that pulls the privacy files from AWS S3 buckets to post to it the Malibu data privacy endpoints.
  • Created Hive tables on top ofHBaseusingStorageHandlerfor effectiveOLAPanalysis.
  • Designed Setup maintain Administraor the Azure SQL Database, Azure Analysis Service, Azure SQL Data warehouse, Azure Data Factory, Azure SQL Data warehouse.
  • Extensively worked on OLAP cubes using SSAS and created and Managed OLAP Cubes using SSAS.
  • Worked on cleaning, exploring and manipulation of source data and transform to target system using Python and tools such as Pandas, Numpy, Matplotlib and PostgreSQL.
  • Developed the statistical models designed to forecast market variables under stress scenarios withinFinancial ModelsusingR.
  • Gathered and analyzed business requirements, interacted with various business users, project leaders, developers and took part in identifying different data sources.
  • Implemented large Lamda architectures using Azure Data platform capabilities like Azure Data Lake, Azure Data Factory, HDInsight, Azure SQL Server, Azure ML and Power BI.
  • Creating workflow and process-flow diagrams in MS Visio for various projects
  • Used MongoDB to stored data in JSON format and developed and tested many features of dashboard using Python, Bootstrap, CSS, and JavaScript.
  • Analyzed the business requirements and designed Conceptual and Logical Data models using Erwin and generated database schemas and DDL (Data Definition Language) by using Forward and Reverse Engineering.
  • Well versed experienced in creating pipelines in Azure data factory using different activities like Move &Transform, Copy, filter, foreach, Data bricks etc.
  • Created Hive tables on HDFS to store the data processed by Apache Spark on the Cloudera Hadoop Cluster in Parquet format.
  • Worked as data modeler/analyst by creating and developing relational and dimensional data models by using Erwin.
  • Implemented Normalization and De-Normalization Techniques to build the tables, indexes, views and maintained and implemented stored procedures as per requirements. created pipeline jobs, schedule triggers using Azure data factory.
  • Changed in design in SharePoint for data types and added Metrics.
  • Imported and Exporting Reports from SharePoint and Access
  • Involved in data migration processes to migrate historical data from legacy system into AWS Redshift (to S3 as flat files and Redshift as tables) using Python.
  • Involved in creating charts and graphs of the data from different data sources by using matplotlib and scipy libraries in python.
  • Used ad hoc queries for querying and analyzing the data, participated in performing data profiling, data analyzing, data validation and data mining.
  • Developed complex ETL mappings for Stage, Dimensions, Facts and Data marts load.
  • Involved in Data Extraction for various Databases & Files using Talend.
  • Worked on Tableau for Data Analysis, Digging the data for source systems for analysis and deeply dive in the data for Predictive findings and for various data Analysis by using dash boards and visualization.

Environment: Windows, Linux, Java, R, R studio, Python, ETL, Talend, Pandas, Numpy, R-Base, T-SQL, Visio, Azure Databricks, R-Shiny, Matplotlib, SSIS, MongoDB, Git, SharePoint, PostgreSQL, Hive, Nifi,Tableau, Azure data factory, Azure analysis services, ETL, SQL, SDLC, Agile, Scrum, MySQL.

Confidential

Data Engineer

Responsibilities:

  • Understood and articulated business requirements from user interviews and then convert requirements into technical specifications.
  • Gathered and analyzed business requirements, interacted with various business users, project leaders, developers and took part in identifying different data sources.
  • Involved with Data Analysis primarily Identifying Data Sets, Source Data, Source Meta Data, Data Definitions and Data Formats.
  • Generated new data mapping documentations and redefined the proper requirements in detail.
  • Used pandas, NumPy, seaborn, SciPy, matplotlib, scikit-learn in Python for developing various machine learning algorithms.
  • Created and deployed data dashboards on docker to visualize the inquiries querying in SQL to retrieve data as per client inquiries providing daily reports using production data and experimental analysis using development data.
  • Wrote test cases, developed Test scripts using SQL and PL/SQL for UAT.
  • Worked with ApacheSpark SQLand data frame functions to perform data transformations and aggregations on complex semi structured data.
  • Extracted data from Twitter using Java and TwitterAPI. ParsedJSONformatted twitter data and uploaded to database.
  • Identified issues within the data by querying the source data and identifying the data patterns.
  • Visually plotted the data using matplotlib and Seaborn after performing analysis with pandas.
  • DevelopedSparkApplications by usingSpark, Java and Implemented ApacheSparkdata processing project to handle data from variousRDBMSand Streaming sources.
  • Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T-SQL, Spark SQL and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in InAzure Databricks.
  • Using pandas Data Frame performed Group by, merging and joining operations like in SQL.
  • Collected business requirements to set rules for proper data transfer from Data Source to Data Target in Data Mapping, ETL tools like Informatica for loading data into the staging tables in the database.
  • Read date from different sources like CSV file, Excel, HTML page and SQL and performed data analysis and written to any data source like CSV file, Excel, or database.
  • Fetched twitter feeds for certain important keyword usingpython -twitterlibrary.
  • Performed data visualization withMatplotlib,R Shiny,ggplot2andTableau.
  • Implementing advanced procedures like text analytics and processing using the in-memory computing capabilities likeApacheSpark written inScala.
  • Performed data testing, tested ETL mappings (Transformation logic), tested stored procedures, and tested the XML messages.
  • Visualizing the data with the help of box plots and scatter plots to understand the distribution of data using Tableau, Python libraries.
  • Developed Merge jobs in Python to extract and load data into MySQL database, also worked on Python ETL file loading and use of regular expression.
  • Worked on data pre-processing and cleaning the data to perform feature engineering and performed data imputation techniques for the missing values in the dataset using Python.
  • Used extracted data for analysis and carried out various mathematical operations for calculation purpose using python library - NumPy, SciPy.
  • Data wrangling and scripting in Python, database cleanup in SQL, advanced model building in R/ Python, and expertise in data visualization.

Environment: Python, PostgreSQL, ETL, Informatica, R studio, Scikit-Learn, Seaborn, Numpy, SciPy, MySQL, SQL, PL/SQL, SSAS/SSIS, UAT, Informatica, ETL, CSV file, Excel, HTML, Tableau, SQL.

Confidential

Data Analyst

Responsibilities:

  • Performed complex data analysis in support of ad-hoc and standing customer requests.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Written a complex SQL, PL/SQL, Procedures, Functions, and Packages to validate data and testing process.
  • Used SAS to mine, alter, manage, and retrieve data from a variety of sources and perform statistical analysis.
  • Involved in data mining, transformation and loading from the source systems to the target system.
  • Developed Ad-hoc reports using Tableau Desktop, Excel.
  • Have good experience in creating Star schema cubes using SSAS.
  • Developed company dashboards to integrate with SharePoint sites and provide users with one-stop shopping for reporting.
  • Defined specifications like use case documentation, activity diagram, and business process flow using Microsoft Visio.
  • Designing the User Interface using HTML/DHTML, Java script
  • Worked with data investigation, discovery, and mapping tools to scan every single data record from many sources.
  • Worked on database design for OLTP and OLAP systems
  • Translated business concepts into XML vocabularies by designing XML Schemas with UML
  • Designed security model of reporting system, creating groups, folders, and users in Business Objects.
  • Created SSIS packages to populate data from various data sources.
  • Performed the Data Accuracy, Data Analysis, Data Quality checks before and after loading the data.
  • Developed database triggers and stored procedures using T-SQL cursors and tables.
  • Deployed reports created report schedules and subscriptions. Managing and securing reports using SSRS.
  • Created Crystal reports depend upon end user needs.

Environment: Python, SQL, PL/SQL, OLAP, OLTP, Tableau, Excel, T-SQL,SSAS/SSIS, R studio, XML, SharePoint, SSIS, SAS, SSRS, PySpark, Numpy.

We'd love your feedback!