We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Charlotte, NC

SUMMARY:

  • Overall 8+ years of IT experience in implementation and working of Multi - tired, Distributed Applications and Web Based Applications as a Big data Engineer and Data Modeler/Analyst.
  • Experience in NoSQL databases like HBase, Cassandra & MongoDB, database performance tuning & data modeling.
  • Work in a fast-paced agile development environment to quickly analyze, develop, and test potential use cases for the business.
  • Excellent knowledge on Hadoop Architecture and ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
  • Develops and builds frameworks/prototypes that integrate Big Data and advanced analytics to make business decisions
  • Loading ETL and SQL Server, Oracle and other relational and non-relational databases.
  • Experience in RDBMS (Oracle) PL/SQL, SQL, Stored Procedures, Functions, Packages, Triggers.
  • Expertise on Relational Data modeling (3NF) and Dimensional data modeling.
  • Practical understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
  • Assist application development teams during application design and development for highly complex and critical data projects
  • Work closely with development, test, documentation and product management teams to deliver high quality products and services in a fast paced environment
  • Experience on Data profiling, Data Analysis, Data Cleansing and Data Masking.
  • Excellent understanding of Hadoop Architecture and underlying Hadoop Framework including Storage Management.
  • Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
  • Experience in analyzing data using HiveQL and custom MapReduce programs
  • Proficient in data mart design, creation of cubes, identifying facts& dimensions, star & snowflake schemes.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig, and Splunk.
  • Experience with Data Warehouse Netezza and have worked extensively on PL/SQL.
  • Comprehensive knowledge and experience in process improvement, normalization/de-normalization.
  • Hands on experience in installing, configuring, and using Hadoop components like Hadoop Map Reduce, HDFS, HBase, Hive, Sqoop and Flume.
  • Creating Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Hands on experience in Linux Shell Scripting. Working with Big Data distributions Cloudera.
  • Good understanding in Machine Learning and statistical analysis with Matlab.
  • Configuring and administering the Hadoop Cluster using major Hadoop Distributions like Apache Hadoop and Cloudera.

TECHNICAL SKILLS:

Big Data / Hadoop Ecosystem: MapReduce, HBase 1.2, Hive 2.3, Pig 0.17, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

Data Modeling Tools: Erwin Data Modeler, Erwin Model Manager, ER Studio v17, and Power Designer 16.6.

Cloud Management: Amazon Web Services(AWS), Amazon Redshift

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

Testing and defect tracking Tools: HP/Mercury, Quality Center, Win Runner, MS Visio & Visual Source Safe

Operating System: Windows, Unix, Sun Solaris

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

Methodologies: RAD, JAD, RUP, UML, System Development Life Cycle (SDLC), Agile, Waterfall Model.

WORK EXPERIENCE:

Confidential, Charlotte, NC

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they relate to the development of analytics.
  • Worked in a Hadoop ecosystem implementation/administration, installing software patches along with system upgrades and configuration.
  • Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
  • Used Agile (SCRUM) methodologies for Software Development.
  • Conducted performance tuning of Hadoop clusters while monitoring and managing Hadoop cluster job performance, capacity forecasting, and security.
  • Used Big Data Analytic technologies and applications in both business intelligence analyses.
  • Responsible for the planning and execution of big data analytics, predictive analytics and machine learning initiatives.
  • Assisted in leading the plan, building, and running states within the Enterprise Analytics Team.
  • Engaged in solving and supporting real business issues with your Hadoop distributed File systems and Open Source framework knowledge.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Designed and developed software applications, testing, and building automation tools.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Developed analytics enablement layer using ingested data that facilitates faster reporting and dashboards.
  • Worked with production support team to provide necessary support for issues with CDH cluster and the data ingestion platform.
  • Lead architecture and design of data processing, warehousing and analytics initiatives.
  • Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing Big Data technologies using Hadoop, MapReduce, HBase, Hive and Cloud Architecture.
  • Worked on implementation and maintenance of Cloudera Hadoop cluster.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Developed Oozie workflow jobs to execute hive, Sqoop and MapReduce actions.
  • Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco-system.
  • Performed File system management and monitoring on Hadoop log files.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Used Flume to collect, aggregate, and store the web log data from various sources like web servers, mobile and network devices and pushed to HDFS.
  • Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
  • Involved in developing MapReduce framework, writing queries scheduling MapReduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Involved in migration of data from existing RDBMS (Oracle and SQL server) to Hadoop using Sqoop for processing data.

Environment: Hadoop 3.0, MapReduce, HBase, Hive 2.3, Informatica, HDFS, Sqoop 1.4, Apache Nifi, HDFS, AWS, EC2, SQL server, Oracle 12c, Apache Flume

Confidential, Mt Laurel, NJ

Sr. Data Engineer

Responsibilities:

  • Worked as a Big Data implementation engineer within a team of professionals.
  • Used Agile Methodology of Data Warehouse development using Kanbanize.
  • Worked with Hadoop Ecosystem components like HBase, Sqoop, Zookeeper, Oozie, Hive and Pig with Cloudera Hadoop distribution.
  • Developed data pipeline using Spark, Hive and HBase to ingest customer behavioral data and financial histories into Hadoop cluster for analysis.
  • Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) to study customer behavior.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Involved in different phases of Development life including Analysis, Design, Coding, Unit Testing, Integration Testing, Review and Release as per the business requirements.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Objective of this project is to build a data lake as a cloud based solution in AWS using Apache Spark.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Worked in exporting data from Hive tables into Netezza database.
  • Implemented the Big Data solution using Hadoop, hive and Informatica to pull/load the data into the HDFS system.
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Developed Scala scripts, UDF's using both Data frames/SQL and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into RDBMS through Sqoop.
  • Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache Nifi for Flat Files, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Load the data from different sources such as HDFS or HBase into Spark RDD and implement in memory data computation to generate the output response.
  • Developed complete end to end Big-data processing in Hadoop eco system.
  • Used AWS Cloud with Infrastructure Provisioning / Configuration.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Worked on configuring and managing disaster recovery and backup on Cassandra Data.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
  • Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
  • Implemented partitioning, dynamic partitions and buckets in Hive.
  • Developed various Qlik View Data Models by extracting and using the data from various sources files, Excel, and Big data, Flat Files.
  • Reviewed requirements together with QA Manager, ETL leads to enhancing the data warehouse for the originations systems and servicing systems.
  • Enforced referential integrity in the OLTP data model for consistent relationship between tables and efficient database design.
  • Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
  • Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.

Environment: Apache Spark, Hive 2.3, Informatica, HDFS, MapReduce, Scala, Apache Nifi 1.6, Yarn, HBase, PL/SQL, Mongo DB, Pig 0.16, Sqoop 1.2, Flume 1.8

Confidential, Dublin, OH

Data Architect/Data Modeler

Responsibilities:

  • Worked as a Sr. Data Architect/Data Modeler to review business requirement and compose source to target data mapping documents.
  • Walkthroughs with DBA were conducted to update the changes made to the data model.
  • Assisted the Data Modeling team for the needs of the Clients from an Accounting business perspective.
  • Identified areas of improvement to achieve data quality and ensured adherence to data quality standards.
  • Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
  • Performed data analysis and data profiling using complex SQL on various sources systems
  • Used CA Erwin Data/ Modeler (Erwin) for data modeling Perform Data Analysis & Profiling Activities to identify volumes, data quality issues to Solution Designers & ETL Architects.
  • Performed structural modifications using MapReduce, Hive and analyze data using visualization/reporting tools.
  • Worked with Architecture team to get the Metadata approved for the new data elements that are added for this project.
  • Worked on Amazon database Redshift and NoSQL database Cassandra.
  • Created 3NF business area data modeling with de-normalized physical implementation data and information requirements analysis using Erwin tool.
  • Analyzed data requirements & provided conceptual and technical modeling assistance to developers.
  • Reviewed data models with Solution Designer to assess the impact of the new model on the enterprise model.
  • Created Entity Relationships diagrams, data flow diagrams and implemented referential integrity using Erwin.

Environment: Erwin r7.1, OBIEE, Oracle 9i, Oracle Warehouse Builder, Microsoft 2008, SQL Developer, SQL Manager, Crystal Reports, OLTP.

Confidential, Juno Beach, FL

Sr. Data Analyst / Data Modeler

Responsibilities:

  • As a Sr. Data Analyst / Data Modeler I was responsible for all data related aspects of a project.
  • Participated in requirement gathering session, JAD sessions with users, Subject Matter experts, Architect's and BAs.
  • Optimized and updated UML Models (Visio) and Relational Data Models for various applications.
  • Translated business and data requirements into data models in support of Enterprise Data Models, Data Warehouse and Analytical systems.
  • Worked with Business Analysts team in requirements gathering and in preparing functional specifications and translating them to technical specifications.
  • Worked with Business users during requirements gathering and prepared Conceptual, Logical and Physical Data Models.
  • Designed both 3NF data models for ODS, OLTP systems and dimensional data models using star and snowflake Schemas
  • Wrote complex SQL queries for validating the data against different kinds of reports generated by Business Objects.
  • Created and reviewed the conceptual model for the EDW (Enterprise Data Warehouse) with business user.
  • Reverse Engineered DB2 databases and then forward engineered them to Teradata using E/R Studio.
  • Analyzed the business requirements by dividing them into subject areas and understood the data flow within the organization.
  • Created conceptual & logical models, logical entities and defined their attributes, and relationships between the various data objects.
  • Created a list of domains in E/R Studio and worked on building up the data dictionary for the company
  • Created a Data Mapping document after each assignment and wrote the transformation rules for each field as applicable
  • Used Model Mart of E/R Studio for effective model management of sharing, dividing and reusing model information and design for productivity improvement.
  • Created E/R Studio reports in HTML, RTF format depending upon the requirement, Published Data model in model mart, created naming convention files, co-coordinated with DBAs' to apply the data model changes.
  • Used E/R Studio for reverse engineering to connect to existing database and ODS to create graphical representation in the form of Entity Relationships and elicit more information
  • Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs.

Environment: Oracle 10g, Microsoft SQL Server 2012, SQL Developer, SQL Manager, Erwin r9, SQL Developer Data Modeler, Visio, Informatica, Crystal Reports

Confidential

Data Analyst

Responsibilities:

  • Worked closely with various business teams in gathering the business requirements.
  • Worked with business analyst to design weekly reports using combination of Crystal Reports.
  • Experienced in data cleansing and Data migration for accurate reporting
  • Worked extensively on SQL querying using Joins, Alias, Functions, Triggers and Indexes.
  • Improved performance on SQL queries used indexes for tuning created DDL scripts for database. Created PL/SQL Procedures and Triggers.
  • Created tables, views, sequences, triggers, table spaces, constraints and generated DDL scripts for physical implementation.
  • Performed data mining on data using very complex SQL queries and discovered pattern.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Performed data analysis, statistical analysis, generated reports, listings and graphs using SAS tools-SAS/Base, SAS/Macros and SAS graph, SAS/SQL, SAS/Connect, and SAS/Access.
  • Wrote PL/SQL statement, stored procedures and Triggers in DB2 for extracting as well as writing data.
  • Developed SQL Server database to replace existing Access databases.
  • Performed thorough data analysis for the purpose of overhauling the database using SQL Server.
  • Involved with data profiling for multiple sources and answered complex business questions by providing data to business users.
  • Developed SQL scripts involving complex joins for reporting purposes.
  • Assisted with designing database packages and procedures.
  • Involved in defining the source to target data mappings, business rules, data definitions.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization.
  • Worked on database testing, wrote complex SQL queries to verify the transactions and business logic.
  • Wrote ad-hoc SQL queries and worked with SQL and Netezza databases.

Environment: Crystal Reports, T-SQL, SAS, PL/SQL, DB2, SQL Server, MS Power Point, MS Access, SQL assistant, MySQL

Hire Now