We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

3.00/5 (Submit Your Rating)

Deerfield, IL

SUMMARY:

  • Over 8+ years of experience as Big Data Engineer /Data Engineer and Data Analyst including designing, developing and implementation of data models for enterprise - level applications and systems.
  • Professional IT experience this includes recent experience in Big Data/Hadoop Ecosystem
  • Competence in using various Hadoop components such as MapReduce (MR1),YARN(MR2), HDFS, Pig, Hive, HBase, ZooKeeper, Oozie, Hue
  • Experience in building highly reliable, scalable Big data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
  • Good experienced in Data Modeling and Data Analysis as a Proficient in gathering business requirements and handling requirements management.
  • Pleasant experience working in Agile/Scrum development environment participated in technical discussions with client and contributed to project analysis and development specs.
  • Experience in transferring the data using Informatica tool from AWS S3 to AWS Redshift
  • Hands on experience in Normalization (1NF, 2NF, 3NF and BCNF) Denormalization techniques for effective and optimum performance in OLTP and OLAP environments.
  • Expertise in moving structured schema data between Pig and Hive using HCatalog.
  • Excellent working experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
  • Experience in SQL and good knowledge in PL/SQL programming and developed Stored Procedures and Triggers and Data Stage, DB2, Unix, Cognos, MDM, Hadoop, Pig.
  • Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
  • Proficient knowledge and hands on experience in writing shell scripts in Linux.
  • Good Experience on importing and exporting the data from HDFS and Hive into Relational Database Systems like MySQL and vice versa using Sqoop.
  • Good knowledge on NoSQL Databases including HBase, MongoDB, MapR-DB.
  • Installation, configuration and administration experience in Big Data platforms Cloudera Manager of Cloudera, MCS of MapR.
  • Strong experience and knowledge of NoSQL databases such as MongoDB and Cassandra.
  • Strong Knowledge of Data Warehouse Architecture and Star Schema, Snow flake Schema, FACT and Dimensional Tables.
  • Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
  • Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
  • Experience working with Relational Database Management Systems (RDMS)
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture
  • Good understanding of service-oriented architecture (SOA) and web services like XML and SOAP.
  • Experience in object-oriented analysis and design (OOAD), used modelling language (UML) and design patterns.
  • Expertise in SQL Server Analysis Services (SSAS) and SQL Server Reporting Services (SSRS) tools.
  • Involve in writing SQL queries, PL/SQL programming and created new packages and procedures and modified and tuned existing procedure and queries using TOAD.
  • Good Understanding and experience in Data Mining Techniques like Classification, Clustering, Regression and Optimization.
  • Experience in complete project life cycle (design, development, testing and implementation) of Client Server and Web applications.

TECHNICAL SKILLS:

Big Data & Hadoop Ecosystem: MapReduce, Spark 2.3, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Cloudera Manager, Stream sets, Neo4j, Hadoop 3.0, Apache Nifi 1.6, Cassandra 3.11

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Databases: Oracle, DB2, SQL Server.

Data Modeling Tools: Erwin R9.7/9.6, ER Studio V17

Operating Systems: Microsoft Windows Vista7/8 and 10, UNIX, and Linux.

Packages: Microsoft Office 2016, Microsoft Project 2016, SAP and Microsoft Visio, Share point Portal Server

Project Execution Methodologies: Agile, Ralph Kimball and BillInmon s data warehousing methodology, Rational Unified Process (RUP), Rapid Application Development (RAD), Joint Application Development (JAD)

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, Python, AWK, SED

PROFESSIONAL EXPERIENCE:

Confidential - Deerfield, IL

Sr. Big Data Engineer

Responsibilities:

  • As a Sr. Big Data Engineer, provided technical expertise and aptitude to Hadoop technologies as they related to the development of analytics.
  • Expertise in writing Hadoop Jobs to analyze data using MapReduce, Hive, and Pig.
  • Experience in designing, building and implementing complete Hadoop ecosystem comprising of MapReduce, HDFS, Hive, Pig, HBase, MongoDB, and Spark.
  • Worked experience in Scrum / Agile framework and Waterfall project execution methodologies.
  • Managed data from various file system to HDFS using UNIX command line utilities.
  • Worked in Azure environment for development and deployment of Custom Hadoop Applications.
  • Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
  • Designed and implemented scalable Cloud Data and Analytical architecture solutions for various public and private cloud platforms using Azure.
  • Performed detailed analysis of business problems and technical environments and use this data in designing the solution and maintaining data architecture.
  • Designed and developed software applications, testing, and building automation tools.
  • Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, MapReduce, Spark and Shells scripts (for scheduling of few jobs).
  • Involved in importing and exporting data between RDBMS and HDFS using Sqoop.
  • Performed querying of both managed and external tables created by Hive using Impala.
  • Implemented the Big Data solution using Hadoop, and hive to pull/load the data into the HDFS system.
  • Installed and configured Hadoop and responsible for maintaining cluster and managing and reviewing Hadoop log files.
  • Implemented and configured workflows using Oozie to automate jobs.
  • Migrated the needed data from MySQL in to HDFS using Sqoop and importing various formats of flat files into HDFS.
  • Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
  • Implemented various Azure platforms such as Azure SQL Database, Azure SQL Data Warehouse, Azure Analysis Services, HDInsight, Azure Data Lake and Data Factory.
  • Extracted and loaded data into Data Lake environment (MS Azure) by using Sqoop which was accessed by business users.
  • Managed and support of enterprise Data Warehouse operation, big data advanced predictive application development using Cloudera &.
  • Extensively worked on Shell scripts for running SAS programs in batch mode on UNIX.
  • Created partitioned tables in Hive, also designed a data warehouse using Hive external tables and also created hive queries for analysis.
  • Installed, Configured and Maintained Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
  • Worked on creating Hive tables and written Hive queries for data analysis to meet business requirements and experienced in Sqoop to import and export the data from Oracle & MySQL.
  • Developed pig scripts to transform the data into structured format and it are automated through Oozie coordinators.
  • Designed efficient and robust Hadoop solutions for performance improvement and end-user experiences.
  • Developed MapReduce programs for applying business rules on the data.
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Developed and executed hive queries for de-normalizing the data.
  • Created Hive tables on top of the loaded data and writing hive queries for ad-hoc analysis.
  • Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Support Cloud Strategy team to integrate analytical capabilities into an overall cloud architecture and business case development.

Environment: Hadoop 3.0, MapReduce, Hive 2.3, Pig 0.17, HDFS, HBase, MongoDB, Agile, Azure, MySQL, Oozie, MySQL, Sqoop1.4, HBase1.2.

Confidential - Washington, DC

Sr. Data Engineer

Responsibilities:

  • Worked with SME and conducted JAD sessions documented the requirements using UML and use case diagrams
  • As a Big Data implementation engineer I am responsible for developing, troubleshooting and implementing programs.
  • Worked with Business Analyst to understand the user requirements, layout, and look of the interactive dashboard to be developed in tableau.
  • Involved in all phases of SDLC using Agile and participated in daily scrum meetings with cross teams.
  • Developed Big Data solutions focused on pattern matching and predictive modeling
  • Loaded and transformed large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
  • Implemented MapReduce programs to retrieve results from unstructured data set.
  • Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
  • Worked on and designed Big Data analytics platform for processing customer interface preferences and comments using Hadoop, Hive and Pig, Cloudera.
  • Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
  • Worked on reading multiple data formats on HDFS using Scala.
  • Installed and configured Pig and also written Pig Latin scripts.
  • Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL.
  • Build data platforms, pipelines, and storage systems using the Apache Kafka, Apache Storm and search technologies such as Elastic search.
  • Developed Spark scripts by using Python and Scala shell commands as per the requirement.
  • Experienced with batch processing of data sources using Apache Spark, Elastic search.
  • Experienced in AWS cloud environment and on S3 storage and EC2 instances
  • Configured Spark streaming to receive real time data from the Kafka and store the stream data to HDFS.
  • Implemented Kafka producers create custom partitions, configured brokers and implemented High level consumers to implement data platform.
  • Involved in creating Hive tables, loading with data and writing hive queries which will run internally in MapReduce way.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it using MapReduce programs.
  • Implemented Installation and configuration of multi-node cluster on Cloud using Amazon Web Services (AWS) on EC2.
  • Created Hive External tables to stage data and then move the data from Staging to main tables
  • Pulled the data from data lake (HDFS) and massaging the data with various RDD transformations.
  • Created Data Pipeline using Processor Groups and multiple processors using Apache NiFi for Flat File, RDBMS as part of a POC using Amazon EC2.
  • Build Hadoop solutions for big data problems using MR1 and MR2 in YARN.
  • Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
  • Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
  • Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard.
  • Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
  • Involved in developing MapReduce framework, writing queries scheduling map-reduce
  • Developed the code for Importing and exporting data into HDFS and Hive using Sqoop
  • Developed customized classes for serialization and De-serialization in Hadoop.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Utilized Oozie workflow to run Pig and Hive Jobs Extracted files from Mongo DB through Sqoop and placed in HDFS and processed.

Environment: Agile, Hadoop3.0, MapReduce, HDFS, Hive2.3, Pig0.17, HDFS, Scala, Spark, AWS, Python, Elastic search, Kafka1.1

Confidential - Greensboro, NC

Sr. Data Analyst/Data Engineer

Responsibilities:

  • Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
  • Translated logical data models into physical database models, generated DDLs for DBAs
  • Part of team conducting logical data analysis and data modeling JAD sessions, communicated data-related standards.
  • Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
  • Designed and developed the conceptual then logical and physical data models to meet the needs of reporting.
  • Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
  • Responsible for the data architecture design delivery, data model development, review, approval and Data warehouse implementation.
  • Involved in designing and developing Data Models and Data Marts that support the Business Intelligence Data Warehouse.
  • Responsible for Big data initiatives and engagement including analysis, brainstorming, POC, and architecture.
  • Worked with Hadoop eco system covering HDFS, HBase, YARN and MapReduce.
  • Worked on NoSQL databases including Cassandra. Implemented multi-data center and multi-rack Cassandra cluster.
  • Performed Reverse Engineering of the current application using Erwin, and developed Logical and Physical data models for Central Model consolidation.
  • Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Redefined many attributes and relationships in the reverse engineered model and cleansed unwanted tables/columns as part of Data Analysis responsibilities.
  • Designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
  • Created SQL tables with referential integrity, constraints and developed queries using SQL, SQL*PLUS and PL/SQL.
  • Performed GAP analysis of current state to desired state and document requirements to control the gaps identified.
  • Worked on the reporting requirements and involved in generating the reports for the Data Model using crystal reports

Environment: NoSQL, Hadoop 3.0, HDFS, HBase 2.1, YARN, MapReduce MRv2, Erwin R9.7, Cassandra 3.11, NoSQL, SQL, PL/SQL

Confidential - Reston, VA

Data Analyst/ Data Modeler

Responsibilities:

  • Worked with Business users for requirements gathering, business analysis and project coordination.
  • Worked closely with various business teams in gathering the business requirements.
  • Experienced in data cleansing and Data migration for accurate reporting
  • Translated business concepts into XML vocabularies by designing XML Schemas with UML.
  • Participated in all phases of data mining, data collection, data cleaning, developing models, validation, and visualization and performed Gap analysis.
  • Interacted with Business Analysts to gather the user requirements and participated in data modeling JAD sessions.
  • Performed Data mapping, logical data modeling, data mining, created class diagrams and ER diagrams and used SQL queries to filter data.
  • Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
  • Performed data analysis and data profiling using complex SQL on various sources systems
  • Worked on SQL queries in a dimensional data warehouse as well as a relational data warehouse.
  • Defined facts, dimensions and designed the data marts using the Ralph Kimball's Dimensional Data Mart modeling methodology using Erwin.
  • Involved with Business Analysts team in requirements gathering and in preparing functional specifications and changing them into technical specifications.
  • Coordinated with DBAs and generated SQL codes from data models.
  • Part of team conducting logical data analysis and data modeling JAD sessions, communicated data-related standards.
  • Developed detailed ER diagram and data flow diagram using modeling tools following the SDLC structure.
  • Created 3NF business area data modeling with de-normalized physical implementation; data and information requirements analysis.
  • Used SQL tools to run SQL queries and validate the data loaded in to the target tables
  • Created tables, views, sequences, indexes, constraints and generated SQL scripts for implementing physical data model.
  • Created dimensional model for reporting system by identifying required dimensions and facts using Erwin.
  • Worked and extracted data from various database sources like DB2, CSV, XML and Flat files into the Data Stage.
  • Generated ad-hoc reports in Excel Power Pivot and sheared them using PowerBI to the decision makers for strategic planning.

Environment: XML, SQL, ErwinR9.6, 3NF, DB2, CSV, MS Excel 2014, Power BI, Flat Files, DB2, JAD.

Confidential

Data Analyst

Responsibilities:

  • Analyzed data quality issues against source system and prepared a data quality document confirming all the source data quality.
  • Evaluated data profiling, cleansing, integration and extraction tools.
  • Worked on creating Excel Reports which includes Pivot tables and Pivot charts.
  • Involved in extensive Data validation by writing SQL queries and Involved in back-end testing and worked with data quality issues.
  • Managed all indexing, debugging and query optimization techniques for performance tuning using T-SQL.
  • Worked with the application and Business Analyst team to develop requirements.
  • Involved in extensive data validation by writing several complexes SQL queries.
  • Involved in back-end testing and worked with data quality issues.
  • Involved in Data profiling and performed Data Analysis based on the requirements, which helped in catching many Sourcing Issues upfront.
  • Generated comprehensive analytical reports by running SQL queries against current databases to conduct data analysis.
  • Wrote T-SQL statements for retrieval of data and Involved in performance tuning of T-SQL queries and Stored Procedures.
  • Designed/developed tables, views, various SQL queries, stored procedures, functions.
  • Involved in PL/SQL code review and modification for the development of new requirements.
  • Extracted data from existing data source and performed ad-hoc queries.
  • Utilized SAS and SQL extensively for collecting, validating and analyzing the raw data received from the client.
  • Executed data extraction programs/data profiling and analyzing data for accuracy and quality.
  • Analyzed the data using advanced excel functions like Pivot tables, VLOOK up, visualizations to get the descriptive analysis of the data.
  • Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
  • Used advanced Microsoft Excel to create pivot tables, and other excel functions to prepare reports and dashboard with user data.
  • Maintained numerous monthly scripts, executed on monthly basis, produces reports and submitted on time for business review.
  • Developed ad-hoc reports using Crystal reports for performance analysis by business users.

Environment: SQL, PL/SQL, SAS, Microsoft Excel 2010, T-SQL, Pivot Tables, VLOOK up, triggers, Stored Procedures.

We'd love your feedback!