Data Engineer Resume
Boston, MA
SUMMARY
- Over 6+ years of IT experience in domain of Big Data using various Hadoop eco - system tools.
- Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components
- Good experienced in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
- Expertise in moving structured schema data between Pig and Hive using HCatalog.
- Experience working in Agile - Scrum Software Development.
- Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
- Hands on experience on writing Queries, Stored procedures, Functions and Triggers.
- Experience in data management and implementation of Big Data applications using Spark and Hadoop frameworks.
- Working in analyzing data using Spark SQL, Hive QL and Pig Latin.
- Hands on experience building streaming applications using Spark Streaming and Kafka with minimal/no data loss and duplicates.
- Configuring Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
- Expert in Database, RDBMS concepts and using MS Access, MS SQL Server and Oracle.
- Heavy use of Access queries, V-Lookup, formulas, Pivot Tables, etc. Working knowledge of CRM Automation Salesforce.com, SAP.
- Expertise in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
- Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
- Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
- Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
- Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.
- Hands on experience with Amazon EC2 and multi-node clusters.
- Expert in writing SQL queries and optimizing the queries in Oracle, SQL Server.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
- Involve in analysis, development and migration of Stored Procedures, Triggers, Views and other related database objects
- Proficient in R Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation, and Data visualization.
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
TECHNICAL SKILLS
Tools: & Software: TOAD, MS Office, BTEQ, Teradata SQL Assistant
Cloud Services: Amazon AWS, EC2, Redshift, MS Azure
Other Tools: Teradata SQL Assistant, Toad 9.7/8.0, DB Visualizer 6.0, Microsoft Office, Microsoft Visio, Microsoft Excel, Microsoft Project
Project Execution Methodologies: Ralph Kimball and Bill-Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)
Data Modeling Tools: Erwin 9.7, ER/Studio v17, Sybase Power Designer
Big Data Technologies: Hadoop 3.0, Hive 3.1, HDFS, HBase 1.2, Apache Flume 1.8, Sqoop 1.4, Spark 2.4, Pig 0.17, Impala 3.0, and MapReduce MRv2/MRv1.
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c, Teradata R15, MS SQL Server 2017
Operating System: Windows 10/8, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6, SAP Business Objects XIR3.1/XIR2, Web Intelligence, Talend, Tableau, Pentaho
PROFESSIONAL EXPERIENCE
Confidential - Boston, MA
Data Engineer
Responsibilities:
- As a Data Engineer, I will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Worked with clients to better understand their reporting and dash boarding needs and present solutions using structured Agile project methodology approach.
- Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
- Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components.
- Experienced in Microsoft Azure date storage and Azure Data Factory, Data Lake.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snow flake Schemas.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Used HIVE queries to import data into Microsoft Azure cloud and analyzed the data using HIVE scripts.
- Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
- Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Implemented Partitioning, Dynamic Partitions and Buckets in Hive for increasing performance benefit and helping in organizing data in a logical fashion.
- Used Ambari in Azure HDInsight cluster recorded and managed the data logs of name node and data node.
- Created tables in HBase to store variable data formats of PII data coming from different portfolios
- Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
- In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
- Exported the analyzed data to the NoSQL Database using HBase for visualization and to generate reports for the Business Intelligence team using SAS.
- Extensively used Star and Snowflake Schema methodologies.
- Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
- Created Hive tables as per requirement as internal or external tables, intended for efficiency.
- Developed MapReduce programs for the files generated by Hive query processing to generate key, value pairs and upload the data to NoSQL database HBase.
Environment: Agile, Hadoop 3.0, Microsoft Azure, 3NF, Sqoop, Hive 3.1, Pig 0.17, HBase 1.3, MapReduce, NoSQL
Confidential - New Hyde Park, NY
Data Analyst/Data Engineer
Responsibilities:
- Participated in JAD sessions for defining business requirements and finalizing the required data fields and formats.
- Used Agile (SCRUM) methodologies for Software Development.
- Responsible for data mapping and data mediation between the source data table and target data tables.
- Designed and develop end to end ETL processing to AWS using Amazon S3, EMR, and Spark.
- Performed data analysis and data profiling using on various sources systems including.
- Wrote complex SQL scripts and PL/SQL packages, to extract data from various source tables of data warehouse.
- Configured Apache Mahout Engine.
- Assisted in designing test plans, test scenarios and test cases for integration, regression and user acceptance testing.
- Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
- Developed Scala scripts, UDF are using both SQL and RDD in Spark.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
- Responsible for building scalable distributed data solutions using Big Data technologies.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
- Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
- Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Integrated AWS DynamoDB using AWS lambda to store the values the items and backup the DynamoDB streams.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
- Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.
Environment: HBase 1.2, Flume 1.9, Pig, Sqoop, Agile, AWS, PL/SQL, SQL, Apache Mahout 0.14, Hive 2.3, HDFS, Hadoop 3.0, Oozie 5.1, Dynamo DB
Confidential - Des Plaines, IL
Data Analyst
Responsibilities:
- Responsible for the analysis, design, development, coding, generation of reports using SQL, testing and documentation.
- Resolved the data type inconsistencies between the source systems and the target system using the mapping documents and analyzing the database using SQL queries.
- Created Database Maintenance Plans for the performance of SQL Server which covers Database Integrity checks, update database Statistics and Re-indexing.
- Extracted data from production database and prepared financial reports.
- Developed and optimized stored procedures for use as a data window source for complex reporting purpose.
- Used MS-Excel, SQL and UNIX for weekly and monthly.
- Responsible for developing and creating Tables, Views using DDL, and DML.
- Modified UNIX shell scripts to automate pre-session and post session- tasks and BTEQ scripts.
- Used Excel Pivot Tables to represent data and presentation
- Monitored the existing code performance and to change the code for better performance.
- Used Inner Join and Outer join to retrieve data from multiple tables.
- Have used analytical skills and quantitative knowledge for problem solving.
- Wrote SQL scripts to run ad-hoc queries, PL/SQL scripts, Stored Procedures &; Triggers and prepare reports to the management.
- Designed automated reports through MySQL and Excel to reduce manual work.
- Developed PL/SQL programs, stored procedures for data loading and data validations.
- Developed Oracle queries to replace current data warehouse reports.
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Created or modifying the T-SQL queries as per the business requirements.
- Performed data profiling and analysis applied various data cleansing rules designed data standards
- Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
- Extensively involved in Data Governance that involved data definition, data quality, rule definition, privacy and regulatory policies, auditing and access control.
- Designed and Developed Oracle database Tables, Views, Indexes and maintained the databases by deleting and removing old data.
Environment: SQL, MS-Excel, PL/SQL, MySQL, PL/SQL, Oracle 11c, T-SQL
Confidential - Birmingham, AL
Data Analyst
Responsibilities:
- Involved in Business and Data analysis during requirements gathering.
- Gathered and Analyzed Business requirements walkthrough with the business owners
- Used and supported database applications and tools for extraction, transformation and analysis of raw data
- Pulling data using SQL from various servers including SQL Server.
- Performed statistical data analysis and data visualization using Python.
- Performed Data Analysis and Data Validation by writing SQL queries.
- Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
- Worked with the Business Analyst and DBA, conducting team meetings and JAD sessions for technical requirements gathering, business analysis, and testing and project coordination.
- Performed data analysis using SQL queries on source systems to identify data discrepancies and determine data quality.
- Produced PL/SQL statement and stored procedures for extracting as well as writing data.
- Worked extensively in data analysis by querying in SQL and generating various PL/SQL objects.
- Proficiency in SQL across a number of dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).
- Worked in importing and cleansing of data from various sources like flat files, MS SQL Server with high volume data.
- Worked and extracted data from various database sources.
- Used MS Visio for business flow diagrams and defined the workflow.
- Worked extensively on creating tables, Views, SQL stored procedures, functions, triggers and packages using PL/SQL.
- Created pivot tables and charts using worksheet Data and external resources, modified pivot tables, sorted items and group Data, and refreshed and formatted pivot tables,
- Performed in depth analysis in data & prepared weekly, biweekly, monthly reports by using SQL, SAS, and Ms Access.
- Performed data analysis and data profiling using complex SQL on various sources systems.
- Used the MS Access for data pulls and ad-hoc reports for analysis.
- Analyzed data using SAS for automation and determined business data trends.
Environment: SQL, PL/SQL, MySQL, MS Visio, SAS, Ms Access