Sr. Big Data Engineer Resume Philadelphia, PA - Hire IT People

SUMMARY:

Overall 8+ years of IT experience in implementation and w rking of Multi - tired, Distributed Applications and Web Based Applications as a Big data Engineer and Data Modeler/Analyst.
Experience in NoSQL databases like HBase, Cassandra & MongoDB, database performance tuning & data modeling.
Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for the business.
Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
Develops and builds frameworks/prototypes that integrate Big Data and advanced analytics to make business decisions
Loading ETL and SQL Server, Oracle and other relational and non-relational databases.
Experience in RDBMS (Oracle) PL/SQL, SQL, Stored Procedures, Functions, Packages, Triggers.
Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
Assist application development teams during application design and development for highly complex and critical data projects
Work closely with development, test, documentation and product management teams to deliver high quality products and services in a fast paced environment
Experience on Data profiling, Data Analysis, Data Cleansing and Data Masking.
Excellent understanding of Hadoop Architecture and underlying Hadoop Framework including Storage Management.
Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
Experience in analyzing data using HiveQL and custom MapReduce programs
Proficient in data mart design, creation of cubes, identifying facts& dimensions, star & snowflake schemes.
Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
Experience with Data Warehouse Netezza and have worked extensively on PL/SQL.
Comprehensive knowledge and experience in process improvement, normalization/de-normalization.
Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop and Flume.
Creating Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
Hands on experience in Linux Shell Scripting. Working with Big Data distributions Cloudera.
Good understanding in Machine Learning and statistical analysis with Matlab.
Configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.

TECHNICAL SKILLS:

Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Cloud Management: Amazon Web Services(AWS), Amazon Redshift

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Mode

PROFESSIONAL EXPERIENCE:

Confidential - Philadelphia, PA

Sr. Big Data Engineer

Responsibilities:

As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
Designed and developed software applications, testing, and building automation tools.
Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
Build data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
Lead architecture and design of data processing, warehousing and analytics initiatives.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
Worked on implementation and maintenance of Cloudera Hadoop cluster.
Created Hive External tables to stage data and then move the data from Staging to main tables
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
Active involvement in design, new development and SLA based support tickets of Big Machines applications.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
Developed complete end to end Big-data processing in Hadoop eco-system.
Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Proof-of-concept to determine feasibility and product evaluation of Big Data products
Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
Involved in developing MapReduce framework, writing queries scheduling map-reduce
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
Developed customized classes for serialization and De-serialization in Hadoop.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Implemented a proof of concept deploying this product in Amazon Web Services AWS.
Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data

Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Scala 2.12, Spark, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c.

7.2

Confidential - Charlotte, NC

Sr. Big Data Engineer

Responsibilities:

As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
Used Agile (SCRUM) methodologies for Software Development.
Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
Used Big Data Analytic technologies and applications in both business intelligence analyses.
Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
Designed and developed software applications, testing, and building automation tools.
Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences
Worked in exporting data from Hive tables into Netezza database.
Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
Lead architecture and design of data processing, warehousing and analytics initiatives.
Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
Worked on implementation and maintenance of Cloudera Hadoop cluster.
Created Hive External tables to stage data and then move the data from Staging to main tables
Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
Developed complete end to end Big-data processing in Hadoop eco-system.
Performed File system management and monitoring on Hadoop log files.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
Involved in developing MapReduce framework, writing queries scheduling MapReduce
Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c, Apache Flume

Confidential - Mt Laurel, NJ

Sr. Big Data Consultant

Responsibilities:

Worked as a Big Data implementation engineer within a team of professionals.
Used Agile Methodology of Data Warehouse development using Kanbanize.
Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
Implemented partitioning, dynamic partitions and buckets in Hive.
Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
Developed Big Data solutions focused on pattern matching and predictive modeling
Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
Created Hive External tables to stage data and then move the data from Staging to main tables
Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
Developed complete end to end Big-data processing in Hadoop eco system.
Used AWS Cloud with Infrastructure Provisioning / Configuration.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
Worked on configuring and managing disaster recovery and backup on Cassandra Data.
Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
Implemented partitioning, dynamic partitions and buckets in Hive.
Developed various Qlik View Data Models by extracting and using the data from various sources files, Excel, and Big data, Flat Files.
Reviewed requirements together with QA Manager, ETL leads to enhancing the data warehouse for the originations systems and servicing systems.
Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark, Hive 2.3, Informatica, BitBucket, HDFS, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8

Confidential - Dublin, OH

Data Architect/Data Modeler

Responsibilities:

Worked as a Sr. Data Architect/Data Modeler to review business requirement and compose source to target data mapping documents.
Walkthroughs with DBA were conducted to update the changes made to the data model.
Assisted the Data Modeling team for the needs of the Clients from an Accounting business perspective.
Identified areas of improvement to achieve data quality and ensured adherence to data quality standards.
Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
Performed data analysis and data profiling using complex SQL on various sources systems
Used CA Erwin Data/ Modeler (Erwin) for data modeling Perform Data Analysis & Profiling Activities to identify volumes, data quality issues to Solution Designers & ETL Architects.
Performed structural modifications using MapReduce, Hive and analyze data using visualization/reporting tools.
Worked with Architecture team to get the Metadata approved for the new data elements that are added for this project.
Worked on Amazon database Redshift and NoSQL database Cassandra.
Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using Erwin tool.
Analyzed data requirements & provided conceptual and technical modeling assistance to developers.
Reviewed data models with Solution Designer to assess the impact of the new model on the enterprise model.
Created Entity Relationships diagrams, data flow diagrams and implemented referential integrity using Erwin.

Environment: Erwin r7.1, OBIEE, Oracle 9i, Oracle Warehouse Builder, Microsoft 2008, SQL Developer, SQL Manager, Crystal Reports, OLTP.

Confidential, Juno Beach, FL

Sr. Data Analyst / Data Modeler

Responsibilities:

As a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
Translated business and data requirements into data models in support of Enterprise Data Models, Data Warehouse and Analytical systems.
Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snowflake Schemas
Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
Created conceptual & logical models, logical entities and defined their attributes, and relationships between the various data objects.
Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
Used Model Mart of E/R Studio for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
Created E/R Studio reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes.
Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.

Environment: Oracle 10g, Microsoft SQL Server 2012, SQL Developer, SQL Manager, Erwin r9, SQL Developer Data Modeler, Visio, Informatica, Crystal Reports

Confidential

Data Analyst

Responsibilities:

Worked closely with various business teams in gathering the business requirements.
Worked with business analyst to design weekly reports using combination of Crystal Reports.
Experienced in data cleansing and Data migration for accurate reporting
Worked extensively on SQL querying using Joins, Alias, Functions, Triggers and Indexes.
Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
Created tables, views, sequences, triggers, table spaces, constraints and generated DDL scripts for physical implementation.
Performed data mining on data using very complex SQL queries and discovered pattern.
Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
Performed data analysis, statistical analysis, generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
Developed SQL Server database to replace existing Access databases.
Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
Developed SQL scripts involving complex joins for reporting purposes.
Assisted with designing database packages and procedures.
Involved in defining the source to target data mappings, business rules, data definitions.
Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization.
Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
Wrote ad-hoc SQL queries and worked with SQL and Netezza databases.

Environment: Crystal Reports, T-SQL, SAS, PL/SQL, DB2, SQL Server, MS Power Point, MS Access, SQL assistant, MySQL

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Philadelphia, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship