Data Engineer Resume Boston, MA - Hire IT People

SUMMARY

Over 6+ years of IT experience in domain of Big Data using various Hadoop eco - system tools.
Solid understanding of architecture, working of Hadoop framework involving Hadoop Distribute File System and its eco-system components
Good experienced in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
Expertise in moving structured schema data between Pig and Hive using HCatalog.
Experience working in Agile - Scrum Software Development.
Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
Hands on experience on writing Queries, Stored procedures, Functions and Triggers.
Experience in data management and implementation of Big Data applications using Spark and Hadoop frameworks.
Working in analyzing data using Spark SQL, Hive QL and Pig Latin.
Hands on experience building streaming applications using Spark Streaming and Kafka with minimal/no data loss and duplicates.
Configuring Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
Expert in Database, RDBMS concepts and using MS Access, MS SQL Server and Oracle.
Heavy use of Access queries, V-Lookup, formulas, Pivot Tables, etc. Working knowledge of CRM Automation Salesforce.com, SAP.
Expertise in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
Extensive experience in development of T-SQL, Oracle PL/SQL Scripts, Stored Procedures and Triggers for business logic implementation.
Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.
Hands on experience with Amazon EC2 and multi-node clusters.
Expert in writing SQL queries and optimizing the queries in Oracle, SQL Server.
Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
Involve in analysis, development and migration of Stored Procedures, Triggers, Views and other related database objects
Proficient in R Programming Language, Data extraction, Data cleaning, Data Loading, Data Transformation, and Data visualization.
Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.

TECHNICAL SKILLS

Tools: & Software: TOAD, MS Office, BTEQ, Teradata SQL Assistant

Cloud Services: Amazon AWS, EC2, Redshift, MS Azure

Other Tools: Teradata SQL Assistant, Toad 9.7/8.0, DB Visualizer 6.0, Microsoft Office, Microsoft Visio, Microsoft Excel, Microsoft Project

Project Execution Methodologies: Ralph Kimball and Bill-Inmon data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

Data Modeling Tools: Erwin 9.7, ER/Studio v17, Sybase Power Designer

Big Data Technologies: Hadoop 3.0, Hive 3.1, HDFS, HBase 1.2, Apache Flume 1.8, Sqoop 1.4, Spark 2.4, Pig 0.17, Impala 3.0, and MapReduce MRv2/MRv1.

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c, Teradata R15, MS SQL Server 2017

Operating System: Windows 10/8, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 9.6, SAP Business Objects XIR3.1/XIR2, Web Intelligence, Talend, Tableau, Pentaho

PROFESSIONAL EXPERIENCE

Confidential - Boston, MA

Data Engineer

Responsibilities:

As a Data Engineer, I will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
Worked with clients to better understand their reporting and dash boarding needs and present solutions using structured Agile project methodology approach.
Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
Installed, Configured and Maintained the Hadoop cluster for application development and Hadoop ecosystem components.
Experienced in Microsoft Azure date storage and Azure Data Factory, Data Lake.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snow flake Schemas.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Used HIVE queries to import data into Microsoft Azure cloud and analyzed the data using HIVE scripts.
Involved in creating Hive tables, loading the data using it and in writing Hive queries to analyze the data.
Created and maintained Technical documentation for launching Hadoop Clusters and for executing Hive queries and Pig Scripts.
Implemented Partitioning, Dynamic Partitions and Buckets in Hive for increasing performance benefit and helping in organizing data in a logical fashion.
Used Ambari in Azure HDInsight cluster recorded and managed the data logs of name node and data node.
Created tables in HBase to store variable data formats of PII data coming from different portfolios
Used cloud computing on the multi-node cluster and deployed Hadoop application on cloud S3 and used Elastic Map Reduce (EMR) to run a MapReduce.
Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for use case.
In preprocessing phase of data extraction, we used Spark to remove all the missing data for transforming of data to create new features.
Exported the analyzed data to the NoSQL Database using HBase for visualization and to generate reports for the Business Intelligence team using SAS.
Extensively used Star and Snowflake Schema methodologies.
Used various HBase commands and generated different Datasets as per requirements and provided access to the data when required using grant and Revoke
Created Hive tables as per requirement as internal or external tables, intended for efficiency.
Developed MapReduce programs for the files generated by Hive query processing to generate key, value pairs and upload the data to NoSQL database HBase.

Environment: Agile, Hadoop 3.0, Microsoft Azure, 3NF, Sqoop, Hive 3.1, Pig 0.17, HBase 1.3, MapReduce, NoSQL

Confidential - New Hyde Park, NY

Data Analyst/Data Engineer

Responsibilities:

Participated in JAD sessions for defining business requirements and finalizing the required data fields and formats.
Used Agile (SCRUM) methodologies for Software Development.
Responsible for data mapping and data mediation between the source data table and target data tables.
Designed and develop end to end ETL processing to AWS using Amazon S3, EMR, and Spark.
Performed data analysis and data profiling using on various sources systems including.
Wrote complex SQL scripts and PL/SQL packages, to extract data from various source tables of data warehouse.
Configured Apache Mahout Engine.
Assisted in designing test plans, test scenarios and test cases for integration, regression and user acceptance testing.
Developed the code to perform Data extractions from Oracle Database and load it into AWS platform using AWS Data Pipeline.
Developed Scala scripts, UDF are using both SQL and RDD in Spark.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
Responsible for building scalable distributed data solutions using Big Data technologies.
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
Installed and configured Hadoop ecosystem like HBase, Flume, Pig and Sqoop.
Implemented AWS cloud computing platform using S3, RDS, Dynamo DB, Redshift, and Python.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Developed complete end to end Big-data processing in Hadoop eco system.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Integrated AWS DynamoDB using AWS lambda to store the values the items and backup the DynamoDB streams.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.

Environment: HBase 1.2, Flume 1.9, Pig, Sqoop, Agile, AWS, PL/SQL, SQL, Apache Mahout 0.14, Hive 2.3, HDFS, Hadoop 3.0, Oozie 5.1, Dynamo DB

Confidential - Des Plaines, IL

Data Analyst

Responsibilities:

Responsible for the analysis, design, development, coding, generation of reports using SQL, testing and documentation.
Resolved the data type inconsistencies between the source systems and the target system using the mapping documents and analyzing the database using SQL queries.
Created Database Maintenance Plans for the performance of SQL Server which covers Database Integrity checks, update database Statistics and Re-indexing.
Extracted data from production database and prepared financial reports.
Developed and optimized stored procedures for use as a data window source for complex reporting purpose.
Used MS-Excel, SQL and UNIX for weekly and monthly.
Responsible for developing and creating Tables, Views using DDL, and DML.
Modified UNIX shell scripts to automate pre-session and post session- tasks and BTEQ scripts.
Used Excel Pivot Tables to represent data and presentation
Monitored the existing code performance and to change the code for better performance.
Used Inner Join and Outer join to retrieve data from multiple tables.
Have used analytical skills and quantitative knowledge for problem solving.
Wrote SQL scripts to run ad-hoc queries, PL/SQL scripts, Stored Procedures &; Triggers and prepare reports to the management.
Designed automated reports through MySQL and Excel to reduce manual work.
Developed PL/SQL programs, stored procedures for data loading and data validations.
Developed Oracle queries to replace current data warehouse reports.
Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
Created or modifying the T-SQL queries as per the business requirements.
Performed data profiling and analysis applied various data cleansing rules designed data standards
Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
Extensively involved in Data Governance that involved data definition, data quality, rule definition, privacy and regulatory policies, auditing and access control.
Designed and Developed Oracle database Tables, Views, Indexes and maintained the databases by deleting and removing old data.

Environment: SQL, MS-Excel, PL/SQL, MySQL, PL/SQL, Oracle 11c, T-SQL

Confidential - Birmingham, AL

Data Analyst

Responsibilities:

Involved in Business and Data analysis during requirements gathering.
Gathered and Analyzed Business requirements walkthrough with the business owners
Used and supported database applications and tools for extraction, transformation and analysis of raw data
Pulling data using SQL from various servers including SQL Server.
Performed statistical data analysis and data visualization using Python.
Performed Data Analysis and Data Validation by writing SQL queries.
Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
Worked with the Business Analyst and DBA, conducting team meetings and JAD sessions for technical requirements gathering, business analysis, and testing and project coordination.
Performed data analysis using SQL queries on source systems to identify data discrepancies and determine data quality.
Produced PL/SQL statement and stored procedures for extracting as well as writing data.
Worked extensively in data analysis by querying in SQL and generating various PL/SQL objects.
Proficiency in SQL across a number of dialects (we commonly write MySQL, PostgreSQL, Redshift, SQL Server, and Oracle).
Worked in importing and cleansing of data from various sources like flat files, MS SQL Server with high volume data.
Worked and extracted data from various database sources.
Used MS Visio for business flow diagrams and defined the workflow.
Worked extensively on creating tables, Views, SQL stored procedures, functions, triggers and packages using PL/SQL.
Created pivot tables and charts using worksheet Data and external resources, modified pivot tables, sorted items and group Data, and refreshed and formatted pivot tables,
Performed in depth analysis in data & prepared weekly, biweekly, monthly reports by using SQL, SAS, and Ms Access.
Performed data analysis and data profiling using complex SQL on various sources systems.
Used the MS Access for data pulls and ad-hoc reports for analysis.
Analyzed data using SAS for automation and determined business data trends.

Environment: SQL, PL/SQL, MySQL, MS Visio, SAS, Ms Access

We provide IT Staff Augmentation Services!

Data Engineer Resume

Boston, MA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship