Sr. Hadoop, BigData Engineer Resume New York, NY - Hire IT People

SUMMARY:

Result - driven IT Professional with 9+ years of experience in Data Analysis, Data modeling and Big Data professional with applied information Technology.
Thorough understanding on End to End Software Development Life Cycle process in Agile Environment using SCRUM methodologies.
Excellent understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
Strong knowledge in Developing Big data solutions using Data ingestion, Data Storage
Good Knowledge with cloud technologies like Azure and AWS (EMR, S3, RedShift, EC2, DynamoDB).
Proficient in Technical consulting and end-to-end delivery with data analysis, data modeling, data governance and design - development - implementation of solutions.
Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
Experience in configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
Experience in NoSQL databases - HBase, Cassandra & MongoDB, database performance tuning & data modeling.
Proficient experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
Extensive experience in Extraction, Transformation and Loading (ETL) of data from multiple sources into Data Warehouse and Data Mart.
Effective in addressing complex POCs according to business requirements from the technical end.
Experience in importing and exporting Terabytes of data between HDFS and Relational Database Systems using Sqoop.
Extensive experience in using ER modeling tools such as Erwin and ER/Studio, Teradata, BTEQ, and MDM.
Good experience on different ETL tool environments like SSIS, Informatica and reporting tool environments like SQL Server Reporting Services (SSRS), Cognos and Business Objects.
Good knowledge in using apache Nifi to automate the data movement between different Hadoop systems.
Good knowledge on Normalization and De-normalization concepts and design methodologies like Ralph Kimball and Bill Inman’s Data Warehouse methodology.
Experience in Text Analytics, developing different Statistical Machine Learning, Data Mining solutions to various business problems
Expertise in integration of various data sources like RDBMS, Spreadsheets, Text files, and XML files.
Excellent experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
Good expertise knowledge with the UNIX commands like changing the permissions of the file to file and group permissions.
Strong background in mathematics and have very good analytical and problem solving skills.

TECHNICAL SKILLS:

Big Data Ecosystem: Map Reduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Cloud Management: Amazon Web Services (AWS), Amazon RedShift

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Programming Languages: SQL, PL/SQL, UNIX Shell Scripting

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2

Testing & Defect Tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio 2016 & Visual Source Safe

Operating System: Windows 7/8/10, UNIX, Sun Solaris

ETL/Data warehouse Tools: Informatica v10, SAP Business Objects Business Intelligence 4.2 Service Pack 03, TalenD, Tableau, and Pentaho

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6

PROFESSIONAL EXPERIENCE:

Confidential - New York, NY

Sr. Hadoop, BigData Engineer

Roles & Responsibilities:

Participated in Sprint review/retro meetings, daily SCRUM meetings and give the daily status report.
Involved in all the phases of the Software Development Life Cycle (SDLC) methodologies such as Waterfall, Agile-SCRUM.
Responsible for developing, troubleshooting and implementing programs.
Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
Developed in scheduling Oozie workflow engine to run multiple Hives and pig jobs.
Developed Oozie workflow jobs to execute hive, Sqoop and Map Reduce actions.
Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Created external tables pointing to HBase to access table with huge number of columns.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
Involved in converting MapReduce programs into Spark transformations using Spark python API.
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
Managed data from various file system to HDFS using UNIX command line utilities.
Implemented monitoring and established best practices around usage of Elastic search
Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
Migrated the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
Worked on Apache Nifi as ETL tool for batch processing and real time processing.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.
Extracted files from Cassandra through Sqoop and placed in HDFS for further processing.
Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.

Environment: Hadoop 3.0, HDFS, Apache Hive2.3, MapReduce, Apache Nifi1.6, Apache Pig 0.17, MongoDB, Sqoop1.4, SQL, Oracle 12c, PL/SQL, Agile, Azure, Yarn, Oozie 4.3, ETL

Confidential - Greensboro, NC

Sr. BigData Engineer

Roles & Responsibilities:

Provided technical expertise and aptitude to BigData technologies as they relate to the development of analytics.
Participated in JAD meetings to gather the requirements and understand the End Users System.
Participated in requirements sessions to gather requirements along with business analysts and product owners.
Involved in Agile development methodology active member in scrum meetings.
Involvement in design, development and testing phases of Software Development Life Cycle (SDLC).
Installed and configured Hive and also written Hive UDFs and Cluster coordination services through Zookeeper.
Architected, Designed and Developed Business applications and Data marts for reporting.
Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
Developed Big Data solutions focused on pattern matching and predictive modeling
Installed and configured Hadoop Ecosystem components.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
Worked Platform using Hive, Sqoop, HBASE .This effort showcased the benefits of the Horton works
Prepared process flow/activity diagram for existing system using MS Visio and re- engineer the design based on business requirements.
Utilized Hadoop, Hive and SQL technologies and moved data sets to production to be utilized by business teams to make business decisions.
Designed and Developed Real time Stream processing Application using Kafka, Scala and Hive to perform Streaming ETL and apply Machine Learning.
Worked closely with business analyst for requirement gathering and translating into technical documentation.
Implemented a proof of concept deploying this product in Amazon Web Services AWS.
Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
Handled performance requirements for databases in OLTP and OLAP models.
Wrote Hive join query to fetch info from multiple tables, writing multiple Map Reduce jobs to collect output from Hive
Involved in Manipulating, cleansing & processing data using Excel, Access and SQL and responsible for loading, extracting and validation of client data.
Wrote SQL scripts to run ad-hoc queries, Stored Procedures & Triggers and prepare reports to the management.
Designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
Used excel sheet, flat files, CSV files to generated Tableau ad-hoc reports.
Involved in reports development using reporting tools like Tableau.

Environment: PL/SQL, Agile, AWS, SQL, HDFS, SSAS, SSRS, Sqoop 1.2, Apache Pig 0.16, Kafka, Scala 1.4, MapReduce, OLAP, OLTP, HBase, Amazon Redshift, MS Visio

Confidential - McLean, VA

Data Analyst/Data Engineer

Roles & Responsibilities:

Worked with the analysis teams and management teams and supported them based on their requirements.
Involved in extraction, transformation and loading of data directly from different source systems (flat files/Excel/Oracle/SQL/Teradata) using SAS/SQL, SAS/macros.
Generated PL/SQL scripts for data manipulation, validation and materialized views for remote instances.
Conducted JAD sessions with stakeholders and software development team to analyze the feasibility of needs.
Extensively experienced with Agile and SCRUM programming methodology
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS.
Performed Data Analysis and Data Manipulation of source data from SQL Server and other data structures to support the business organization.
Performed data analysis and data profiling using on various sources systems including Oracle, SQL Server and DB2.
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
Involved in all phases of data mining, data collection, data cleaning, developing models, validation and visualization.
Designed and develop Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using Star and Snow flake Schemas.
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
Created data models for AWS Red shift, Hive and HBase from dimensional data models.
Created Data Validation rules using SQL to validate the structure and integrity of the extracted data.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Worked with reversed engineer Data Model from Database instance and Scripts.
Worked on data mapping and data mediation between the source data table and target data tables using MS Access and MS Excel.
Performed data extraction, data analysis, data manipulation and prepared various production and ad-hoc reports to support cost optimization initiatives and strategies.

Environment: HDFS, Sqoop, Apache Pig, AWS, Amazon Red shift, Apache Hive, Map Reduce, ODS, OLTP, HBase, SQL Server

Confidential - Dallas, TX

Data Analyst/Data Modeler

Roles & Responsibilities:

Worked with supporting business analysis and marketing campaign analytics with data mining, data processing, and investigation to answer complex business questions.
Designed the ER diagrams, logical model and physical database and for Oracle and Teradata as per business requirements using Erwin
Designed 3rd normal form target data model and mapped to logical model.
Developed Data mapping Transformation and Cleansing rules for the Master Data Management involving OLTP and OLAP.
Used MS Access, MS Excel, Pivot tables and charts, MS PowerPoint, MS Outlook, MS Communicator and User Base to perform responsibilities.
Designed reports in Access, Excel using advanced functions not limited to pivot tables, formulas
Used SQL, PL/SQL to validate the Data going in to the Data warehouse
Wrote complex SQL, PL/SQL testing scripts for Backend Testing of the data warehouse application. Expert in writing Complex SQL/PLSQL Scripts in querying Teradata and Oracle.
Designed and developed the data dictionary and Metadata of the models and maintain them.
Involved in extensive DATA validation using SQL queries and back-end testing
Tested the database to check field size validation, check constraints, stored procedures and cross verifying the field size defined within the application with metadata.
Designed and developed cubes using SQL Server Analysis Services(SSAS) using Microsoft Visual
Provided PL/SQL queries to developer as source queries to identify the data provided logic to assign.
Designed the data marts in dimensional data modeling using star and snowflake schemas.
Analyzed and presented the gathered information in graphical format for the ease of business managers.
Developed and maintained the data dictionaries, Naming Conventions, Standards, and Class words Standards Document.
Performed GAP analysis to analyze the difference between the system capabilities and business requirements.
Produced Source to target data mapping by developing the mapping spreadsheets.
Created documentation and test cases, worked with users for new module enhancements and testing.

Environment: Erwin9.5, Agile, Oracle11g, ETL, SSIS, Teradata, SSAS, PL/SQL, OLTP, OLAP, DBA, AWS

Confidential

Data Analyst

Roles & Responsibilities:

Worked with Data Analysts to understand Business logic and User Requirements.
Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
Created reports for the Data Analysis using SQL Server Reporting Services.
Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
Created SQL queries to simplify migration progress reports and analyses.
Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
Using V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
Extracted data from different sources performing Data Integrity and quality checks.
Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
Collected, analyze and interpret complex data for reporting and/or performance trend analysis
Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.

Environment: SQL Server, MS Excel, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint

We provide IT Staff Augmentation Services!

Sr. Hadoop, Bigdata Engineer Resume

New York, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship