Sr. Big Data Engineer Resume
Philadelphia, PA
SUMMARY:
- Overall 8+ years of IT experience in implementation and w rking of Multi - tired, Distributed Applications and Web Based Applications as a Big data Engineer and Data Modeler/Analyst.
- Experience in NoSQL databases like HBase, Cassandra & MongoDB, database performance tuning & data modeling.
- Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for the business.
- Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Develops and builds frameworks/prototypes that integrate Big Data and advanced analytics to make business decisions
- Loading ETL and SQL Server, Oracle and other relational and non-relational databases.
- Experience in RDBMS (Oracle) PL/SQL, SQL, Stored Procedures, Functions, Packages, Triggers.
- Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
- Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
- Assist application development teams during application design and development for highly complex and critical data projects
- Work closely with development, test, documentation and product management teams to deliver high quality products and services in a fast paced environment
- Experience on Data profiling, Data Analysis, Data Cleansing and Data Masking.
- Excellent understanding of Hadoop Architecture and underlying Hadoop Framework including Storage Management.
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Experience in analyzing data using HiveQL and custom MapReduce programs
- Proficient in data mart design, creation of cubes, identifying facts& dimensions, star & snowflake schemes.
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
- Experience with Data Warehouse Netezza and have worked extensively on PL/SQL.
- Comprehensive knowledge and experience in process improvement, normalization/de-normalization.
- Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop and Flume.
- Creating Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Hands on experience in Linux Shell Scripting. Working with Big Data distributions Cloudera.
- Good understanding in Machine Learning and statistical analysis with Matlab.
- Configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.
TECHNICAL SKILLS:
Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11
Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.
Cloud Management: Amazon Web Services(AWS), Amazon Redshift
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2
Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe
Operating System: Windows, Unix, Sun Solaris
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho
Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Mode
PROFESSIONAL EXPERIENCE:
Confidential - Philadelphia, PA
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer, you will provide technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
- Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
- Designed and developed software applications, testing, and building automation tools.
- Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
- Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
- Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
- Build data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulling the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Active involvement in design, new development and SLA based support tickets of Big Machines applications.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco-system.
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark and provide visualization of the ETL orchestration using CDAP tool.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Proof-of-concept to determine feasibility and product evaluation of Big Data products
- Writing Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Involved in developing MapReduce framework, writing queries scheduling map-reduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Developed customized classes for serialization and De-serialization in Hadoop.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data
Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Scala 2.12, Spark, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c.
7.2
Confidential - Charlotte, NC
Sr. Big Data Engineer
Responsibilities:
- As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
- Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
- Used Agile (SCRUM) methodologies for Software Development.
- Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
- Used Big Data Analytic technologies and applications in both business intelligence analyses.
- Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
- Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
- Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
- Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
- Designed and developed software applications, testing, and building automation tools.
- Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences
- Worked in exporting data from Hive tables into Netezza database.
- Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
- Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
- Lead architecture and design of data processing, warehousing and analytics initiatives.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
- Worked on implementation and maintenance of Cloudera Hadoop cluster.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco-system.
- Performed File system management and monitoring on Hadoop log files.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
- Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Involved in developing MapReduce framework, writing queries scheduling MapReduce
- Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
- Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data.
Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c, Apache Flume
Confidential - Mt Laurel, NJ
Sr. Big Data Consultant
Responsibilities:
- Worked as a Big Data implementation engineer within a team of professionals.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
- Implemented partitioning, dynamic partitions and buckets in Hive.
- Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
- Developed Big Data solutions focused on pattern matching and predictive modeling
- Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
- Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
- Created Hive External tables to stage data and then move the data from Staging to main tables
- Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
- Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
- Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
- Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
- Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
- Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Used AWS Cloud with Infrastructure Provisioning / Configuration.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
- Worked on configuring and managing disaster recovery and backup on Cassandra Data.
- Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Implemented partitioning, dynamic partitions and buckets in Hive.
- Developed various Qlik View Data Models by extracting and using the data from various sources files, Excel, and Big data, Flat Files.
- Reviewed requirements together with QA Manager, ETL leads to enhancing the data warehouse for the originations systems and servicing systems.
- Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Spark, Hive 2.3, Informatica, BitBucket, HDFS, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8
Confidential - Dublin, OH
Data Architect/Data Modeler
Responsibilities:
- Worked as a Sr. Data Architect/Data Modeler to review business requirement and compose source to target data mapping documents.
- Walkthroughs with DBA were conducted to update the changes made to the data model.
- Assisted the Data Modeling team for the needs of the Clients from an Accounting business perspective.
- Identified areas of improvement to achieve data quality and ensured adherence to data quality standards.
- Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
- Performed data analysis and data profiling using complex SQL on various sources systems
- Used CA Erwin Data/ Modeler (Erwin) for data modeling Perform Data Analysis & Profiling Activities to identify volumes, data quality issues to Solution Designers & ETL Architects.
- Performed structural modifications using MapReduce, Hive and analyze data using visualization/reporting tools.
- Worked with Architecture team to get the Metadata approved for the new data elements that are added for this project.
- Worked on Amazon database Redshift and NoSQL database Cassandra.
- Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using Erwin tool.
- Analyzed data requirements & provided conceptual and technical modeling assistance to developers.
- Reviewed data models with Solution Designer to assess the impact of the new model on the enterprise model.
- Created Entity Relationships diagrams, data flow diagrams and implemented referential integrity using Erwin.
Environment: Erwin r7.1, OBIEE, Oracle 9i, Oracle Warehouse Builder, Microsoft 2008, SQL Developer, SQL Manager, Crystal Reports, OLTP.
Confidential, Juno Beach, FL
Sr. Data Analyst / Data Modeler
Responsibilities:
- As a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
- Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
- Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
- Translated business and data requirements into data models in support of Enterprise Data Models, Data Warehouse and Analytical systems.
- Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
- Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
- Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snowflake Schemas
- Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
- Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
- Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
- Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
- Created conceptual & logical models, logical entities and defined their attributes, and relationships between the various data objects.
- Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
- Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
- Used Model Mart of E/R Studio for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
- Created E/R Studio reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes.
- Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
- Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.
Environment: Oracle 10g, Microsoft SQL Server 2012, SQL Developer, SQL Manager, Erwin r9, SQL Developer Data Modeler, Visio, Informatica, Crystal Reports
Confidential
Data Analyst
Responsibilities:
- Worked closely with various business teams in gathering the business requirements.
- Worked with business analyst to design weekly reports using combination of Crystal Reports.
- Experienced in data cleansing and Data migration for accurate reporting
- Worked extensively on SQL querying using Joins, Alias, Functions, Triggers and Indexes.
- Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
- Created tables, views, sequences, triggers, table spaces, constraints and generated DDL scripts for physical implementation.
- Performed data mining on data using very complex SQL queries and discovered pattern.
- Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
- Performed data analysis, statistical analysis, generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
- Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
- Developed SQL Server database to replace existing Access databases.
- Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
- Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
- Developed SQL scripts involving complex joins for reporting purposes.
- Assisted with designing database packages and procedures.
- Involved in defining the source to target data mappings, business rules, data definitions.
- Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization.
- Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
- Wrote ad-hoc SQL queries and worked with SQL and Netezza databases.
Environment: Crystal Reports, T-SQL, SAS, PL/SQL, DB2, SQL Server, MS Power Point, MS Access, SQL assistant, MySQL