Data Engineer Resume
Des Moines, IA
SUMMARY
- Over 7 years of IT experience with strong emphasis on requirement gathering, analysis, design, development, implementation, testing and development of software applications in Hadoop, HDFS, MapReduce, Hadoop Ecosystemand RDBMS.
- Experience in GCP, Big Query, GCS bucket, Big dataHadoop,HadoopEcosystem components like MapReduce, Sqoop, Flume, Kafka, Pig, Hive, Spark, Storm, HBase, Airflow, Oozie, and Zookeeper.
- Worked extensively on installing and configuringHadoopecosystem components Hive, SQOOP, HBase, Zookeeper and Flume.
- Good experience of software development in Python (libraries used: Beautiful Soup, NumPy, SciPy, Pandasdata frame, Matplotlib, network, urllib2, MySQL dB for database connectivity) and IDEs - sublime text, Spyder, PyCharm, Visual Studio Code.
- Expert in using Django Authentication system, Django templating system, creating models and forms.
- Hand full experience on AWS services like S3 bucket, lambda, API Gateway, Cognito, Step Function, RDS and Cloud Watch using CDK.
- Strong experience in Business and Data Analysis which includes Data Profiling, Data Migration, Data mapping, data transformation, data integration and Metadata Management Services.
- Efficient in analysing and documenting business and functional requirements along with Use Case Modelling and UML.
- Develop and deploy dashboards, visualizations, and autonomous and dynamic reporting interfaces to be distributed to stakeholders via the BI reporting platform, web portal, mobile, tablet devices and widgets.
- Expert knowledge in generating - Use Case Diagrams, Sequence Diagrams, Activity Diagrams, Data Flow Diagrams, Class Diagrams and State Chart Diagrams.
- Good in system analysis, ER Dimensional Modeling, Database design and implementing RDBMS specific features.
- Expertise in 3rd Normal Form (3NF), OLTP, identifying attributes of Facts and Dimensions, Star/Snowflake Modeling for data warehouse.
- Worked extensively on designing Canonical, Conceptual, Logical and Physical data modeling using ERwin, Power Designer and ER Studio in several projects in both OLAP and OLTP applications.
- Created network architecture on AWS VPC, subnets, Internet Gateway, Route. Perform S3 buckets creation, configured the storage on S3 buckets, policies and the IAM role-based policies.
- Worked with MySQL, Oracle, Excel, MS SQL Server databases.
- Experience in creating dashboards using calculations, groups, sets, parameters, calculated fields, and hierarchies.
- Expertise in development of SQL Server2008/2008 R2/2012/2014 with specialized skills in working with the Business Intelligence Development Studio (BIDS), which includes SSIS, SSRS and report builder.
- Extensive Tableau Experience in Enterprise Environment and Tableau Experience including technical support, troubleshooting, report design and monitoring of system usage.
- Knowledge of best practices and principles for data visualization, dashboard and report design, marketing analytics, and data mining.
- Strong development experience with the Microsoft BI Stack -SQL-Server, SSIS, SSRS, SSAS, Power Pivot, Power BIand SharePoint.
- Extensively involved in Data Extraction, Transformation and Loading (ETL process) from Source to target systems using Informatica Power Centre 7.1.
- Good experience with data visualization tools Tableau and Qlikview.
- Well versed in conducting Gap analysis, Joint Application Design (JAD) session, User Acceptance Testing (UAT), Cost benefit analysis and ROI analysis.
- Extensive use of rational tools including Requisite Pro, Rose, Clear Case, Clear Quest and HP tools like HP test director (Quality centre.)
TECHNICAL SKILLS
Programming Languages: Scala, Python Unix Shell Scripting, PL/SQL.
Big Data Ecosystem: HDFS, HBase, Map Reduce, Hive, Pig, Spark, Kafka, AWS-EMR, NiFi, Kinesis, Sqoop, Impala, HBase, Oozie, Zookeeper, Flume.
DBMS: Microsoft SQL Server, MySQL, Oracle, Ms Access.
NoSQL Databases: HBase, Cassandra, CouchDB, MongoDB.
Web Services: Restful, SOAP.
Servers: Apache Tomcat, Web logic and Web Sphere.
BI tools: Tableau, PowerBI, Qlik View, Quick Sight.
PROFESSIONAL EXPERIENCE
Confidential, Des Moines, IA
Data Engineer
Responsibilities:
- Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customerspecifications, Development, and Deployment of the Application.
- Developed Architecture for Parsing applications to fetch the data from different services and transforming to store in different formats.
- Developed parsers for Extracting data from different sources of web services and transforming to store in various formats such as CSV, Database files, HDFS storage, etc. then to perform analysis.
- Parsers written in Python for extracting useful data from the design database.Used Parsekit(Enigma.io) framework for writing Parsers for ETL extraction.
- Implemented Algorithms for Data Analysis from Cluster of Web services.
- Used REST APIs with Python to ingest data from various sites to BIG Query
- Used SOQL, Google DataProc, GCS Bucket to load data on incremental basis as received/collected in GCP.
- Worked with lxml to dynamically generate SOAP requests based on the services and developed custom Hash-Key (HMAC) based algorithm in Python for Web Service authentication.
- Worked with Report Lab PDF library to dynamically generate the PDF documents with Images and data retrieved from various sources of Web services.
- Generate large and complex data extracts and queries for the Analytical Leads for data analysis by utilizing various database schema such as Microsoft SQL Server, Oracle, and Netezza.
- Built the Web API on the top of Django framework to perform REST methods. Used MongoDB and MySQL databases in Web API development. Developed database migrations using SQLAlchemy Migration.
- Generated graphical reports using python package NumPy and matplotlib.
- Usage of advance features like pickle/unpickle in python for sharing the information across the applications.
- Managed datasets using Panda data frames and MySQL, queried MYSQL database queries from python using Python-MySQL connector and MySQL db package to retrieve information.
- Utilized Python libraries wxPython, NumPy, Twisted and matplotlib.
- Wrote Python scripts to parse XML documents and load the data in database.
- Used Wireshark, live http headers, and Fiddler2 debugging proxy to debug the Flash object and help the developer create a functional component. The PHP page for displaying the data uses AJAX to sort and display the data. The page also outputs data to .csv for viewing in Microsoft Excel.
- Added support for Amazon AWSS3 and RDS to host static/media files and the database into Amazon Cloud.
- Writing Python scripts with Cloud Formation templates to automate installation of Auto scaling, EC2, VPC, and other services.
- Used Docker containers for development and deployment.
- Familiar with UNIX / Linux internals, basic cryptography & security.
- Developed multiple spark batch jobs in Scala using Spark SQL and performed transformations using many APIs and update master data in Cassandra database as per the business requirement.
- Written Spark-Scala scripts, by creating multiple udf's, spark context, Cassandra sql context, multiple API's, methods which support data frames, RDD's, dataframe Joins, Cassandra table joins and finally write/save the data frames/RDD's to Cassandra database.
- As part of the POC migrated the data from source systems to another environment using Spark, SparkSQL.
- Developed and implemented core API services using Python with spark.
- Created data framesschema from raw data stored at Amazon S3 using PySpark.
- Used PySpark Data frame for creation of table and performing analytics over it.
- Created complex stored procedures to perform various tasks including, but not limited to index maintenance, data profiling, metadata searches, and loading of the data mart.
- Using Jenkins AWS Code Deploy plugin to deploy to AWS.
- Developed tools using Python, XML to automate some of the menial tasks. Interfacing with supervisors, artists, systems administrators, and production to ensure production deadlines are met.
Environment: Python 3.x, Parsekit (Enigma.io), Django, Flask, lxml, SUDS, HMAC, pandas, Numpy, matplotlib, MongoDB, MySQL, SOAP, REST, PyCharm, Docker, AWS (EC2, S3), GCP.
Confidential, Des Moines, IA
Data Engineer
Responsibilities:
- Involved in all phases of SDLC including Requirement Gathering, Design, Analysis and Testing of customer specifications, Development, and Deployment of the Application.
- Research and recommend suitable technology stack for Hadoop migration considering current enterprise architecture.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experienced in loading and transforming of large sets of structured, semi-structured and unstructured data.
- Developed Spark jobs and Hive Jobs to summarize and transform data.
- Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scalaand Python.
- Utilized Power BI (Power View) to create various analytical dashboards that depicts critical KPIs such as legal case matter, billing hours and case proceedings along with slicers and dicers enabling end-user to make filters.
- Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
- Experienced in developing Spark scripts for data analysis in both python and scala.
- Wrote Scala scripts to make spark streaming work with Kafka as part of spark Kafka integration efforts.
- Built on-premises data pipelines using Kafka and spark for real-time data analysis.
- Created reports in Tableau for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
- Worked as Power BI Analyst and supported Data Systems team in building reports dashboards and visualizations Also wrote SQL Scripts, and reporting Bugs in Jira to evaluate and test processes
- Implemented Hive complex UDF's to execute business logic with Hive Queries.
- Developed a different kind of custom filters and handled pre-defined filters on HBase data using API.
- Used Power BI Desktop to develop data analysis multiple data sources to visualize the reports.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Handled importing data from different data sources into HDFS using Sqoop and performing transformations using Hive, MapReduce and then loading data into HDFS.
- Exporting of a result set from HIVE to MySQL using Sqoop export tool for further processing.
- Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis.
- Responsible for developing data pipeline by implementing Kafka producers and consumers and configuring brokers.
- Scheduled Oozie workflow engine to run multiple Hive and Pig jobs, which independently run with time and data availability.
- Setup Spark EMR to process huge data which is stored in AmazonS3and used Amazon CLI for data transfers to and from Amazon S3 buckets.
- Executed Hadoop/Spark jobs on AWS EMR using programs and data is stored in S3 Buckets.
- Build and configured Apache TEZ on Hive and PIG to achieve better responsive time while running MR Jobs.
- Experience in pulling the data from Amazon S3 bucket to data lake and built Hive tables on top of it and created data frames in Spark to perform further analysis.
Environment: Python, Django, JavaScript, MySQL, NumPy, SciPy, Pandas API, PEP, PIP, Jenkins, JSON, Git, JavaScript, AJAX, RESTful webservice, MySQL, PyUnit.
Confidential, Deerfield, IL
Data Analyst
Responsibilities:
- Gathered and translated business requirements into detailed, production-level technical specifications detailing new features and enhancements to existing business functionality.
- Researches and assists with scaling new data analytics products within the department including, but not limited to R Studio Server, Tableau, and Microsoft Power BI.
- Supports the development, maintenance, and configuration of reports/dashboards within Oracle Business Intelligence Enterprise Edition (OBIEE).
- Creating SSIS packages using proper control and data flow elements improving the performance of the package run time and unit test the developed packages as an ETL developer.
- Work on ETL process using Microsoft SQL ServerIntegration Services (SSIS) process to provide the data quality, data cleansing, business rule transformations, and data loading, support the data loads and transformations using SSIS for the iterative conversion project.
- Involved in migration from SQL Server 2008 R2 to Oracle databases using SSIS on CSS Conversion Project.
- Develop and maintain tableau visualizations and reports and developed ad-hoc Business Objects Web Intelligence Reports.
- Developed Packages to Load data from MS SQL server to Oracle destinations and in the flow created new SSIS packages according to the requirement for the fields.
- Building, publishing customized interactive reports and dashboards, report scheduling using Tableau server.
- Created action filters, parameters, and calculated sets for preparing dashboards and worksheets in Tableau.
- Developed Tableau visualizations and dashboards using Tableau Desktop and Provided production support for Tableau users.
- Developed Tableau workbooks from multiple data sources using Data Blending.
- Worked on ETL process in SSIS packages and improved the progress time. Used all different transformations required.
- Worked atconceptual/logical/physical data model level using Erwin according to requirements.
- By using cached mode in Tabular Models, integrate data from multiple sources including relational databases, data feeds; excel files and flat text files.
- Conduct extensive quality control and record-keeping procedures to ensure the highest levels of data integrity.
- Manage and track end-user requests and troubleshoot Tableau issues.
- Provide best practices in data visualization and business intelligence software, including recent versions of Tableau and provided automation for Tableau dashboards functional and performance testing.
Environment: Tableau Desktop 2018.3, Teradata SQL Assistant, Tableau Server, MS Excel, MS SQL Server 2012, SSRS 2012 -MS SQL Server Reporting Services (SSRS), T-SQL, PL-SQL, Oracle SQL Server, SQL Server Integration Services.
Confidential, Richmond, VA
Data Analyst
Responsibilities:
- Performed data profiling in the source systems that are required for Data Marts.
- Documented the requirements and complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Worked with internal architects and assisted in the development of current and target state enterprise data architectures.
- Worked with project team representatives to ensure that logical and physical data models were developed in line with corporate standards and guidelines.
- Performed data analysis and data profiling usingSQL on various sources systems.
- Created SQL scripts to find data quality issues and to identify keys, data anomalies, and data validation issues.
- Involved in defining the source to target data mappings, business rules, and data definitions.
- Followed a structured approach for gathering and formalizing business requirements expressed by managers, process owners and operational users of the information system.
- Responsible for defining the key identifiers for each mapping/interface.
- Conducted JAD/JAR sessions, prepared Prototype, documented workflow processes and studied business process workflows diagrams.
- Responsible for defining the functional requirement documents for each source to target interface.
- Documented, clarified, and communicated change requests with the requestor and coordinated with the development and testing team.
- Documented data quality and traceability documents for each source interface.
- Involved in data warehouse design and documented the complete process flow to describe program development, logic, testing, and implementation, application integration, coding.
- Used data analysis techniques to validate business rules and identified low quality missing data in the existing Amgen enterprise data warehouse (EDW).
Environment: Oracle 10g/11g, MS Access 2007, MS Excel 2007, MS Word 2007, MS Outlook 2007, Erwin, Crystal Reports, PowerPoint 2007, SharePoint, data mapping
Confidential, West Des Moines, IA
Data Analyst
Responsibilities:
- Worked on Full life cycle development (SDLC) involving in all stages of development.
- Participated in discussions involving the application creation and understand the requirements and provide the back-end functionality for the applications.
- Create Entity Relationship (ER) Diagrams for the proposed database.
- Worked on Blocking, Deadlocking and write code to avoid these situations
- Created Database Objects - Schemas, Tables, Indexes, Views, User defined functions, Cursors, Triggers, Stored Procedure and Constraints.
- Extensively used Joins and Sub-Queries to simplify complex queries involving multiple tables.
- Designed, tested (unit and integration testing) and implemented Stored procedures and Triggers for data processing of huge volume of data.
- Worked on FTP component in SSIS to load the files from Remote Location to the Server.
- Used various Transformations in SSIS to load data from Flat files, Access databases, XML and Excel to SQL server Databases.
- Developed tabular queries for efficient analysis of report using Pivot/Un pivot in T-SQL.
- Created SQL Server Agent jobs to run the SSIS Packages on a scheduled time.
- Created Package Configurations to store the package properties and to move the Package from one environment to another so it can run independently.
- Created SSRS reports using Report Parameters,Drop-DownParameters, Multi-Valued Parameters Debugging Parameter Issues Matrix Reports and Charts.
Environment: MS SQL Server Management Studio 2008/2008R2/2012, Query Analyzer, Index Tuning Wizard, MS SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS), VB.net.