Sr. Big Data Engineer Resume
Mt Laurel, NJ
SUMMARY:
- Overall 9+ years of profession experience in Software Systems Development, Business Systems including designing and developing with Big Data Engineer/Data Modeler/Data Analyst.
- Good experience in all phases of SDLC and participated in daily scrum meetings with cross teams.
- Excellent experience in developing and designing data integration and migration solutions in MS Azure.
- Experience in Hadoop components like HDFS, MapReduce, Job Tracker, Name Node, Data Node Task Tracker and Apache Spark.
- Excellent understanding and hands on experience with AWS, AWS S3 and EC2.
- Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inman Approach.
- Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Good experience in using SSIS and SSRS in creating and managing reports for an organization.
- Extensive experience on importing and exporting data using stream processing platforms like Flume and Kafka
- Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
- Profound experience in working with Cloudera (CDH4 &CDH5) and Horton Works Hadoop Distributions and Amazon EMR Hadoop distributors on multi - node cluster.
- Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
- Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
- Participating in requirements sessions to gather requirements along with business analysts and product owners.
- Experience in designing a component using UML Design-Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
- Strong experience in using Excel and MS Access to dump the data and analyze based on business needs.
- Experience in designing the Data Mart and creation of Cubes.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
- Experience in Data Cleaning and Data Preprocessing using Python Scripting.
- Experience in using SSIS in solving complex business problems.
- Proficient in writing DDL, DML commands using SQL developer and Toad.
- Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
TECHNICAL SKILLS:
Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11
Data Modeling Tools: Erwin r9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16
BI Tools: Tableau 10, SAP Business Objects, Crystal Reports
Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.
Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server
Version Tool: VSS, SVN, CVS.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
PROFESSIONAL EXPERIENCE:
Confidential - Mt Laurel, NJ
Sr. Big Data Engineer
Responsibilities:
- As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, Shell Scripting, Hive.
- Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
- Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Worked in writing Hadoop Jobs for analyzing data using Hive, Pig accessing Text format files, sequence files, Parquet files.
- Integrated Oozie with Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
- Collaborated with other data modeling team members to ensure design consistency and integrity.
- Good experience working with Hadoop distributions such as HORTONWORKS and CLOUDERA.
- Worked with Sqoop in Importing and exporting data from different databases like MySql, Oracle into HDFS and Hive.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Importing data from SQL Server to HDFS using python based on Sqoop framework.
- Exporting data from HDFS to MYSQL using python based on Hawq framework.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Developed customized classes for serialization and Deserialization in Hadoop.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
- Import the data from different sources like HDFS/HBase into Spark RDD.
- Experienced with Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD and Scala.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Created external tables pointing to HBase to access table with huge number of columns.
- Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
- Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
- Extensively used Erwin for developing data model using star schema methodologies.
- Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases.
- Developed in scheduling Oozie workflow engine to run multiple Hive and pig jobs.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
- Involved in HDFS maintenance and loading of structured and unstructured data and imported data from mainframe dataset to HDFS using Sqoop and written the PySpark Script to process the HDFS data.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.
Environment: Big Data, Hadoop 3.0, Agile, Hive 2.3, Pyspark, HDFS, Oracle 12c, HBase 1.2, Flume 1.8, Pig 0.17, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL, Sqoop
Confidential - Lowell, AR
Bog Data Engineer
Responsibilities:
- Worked as a Data Engineer on several Hadoop Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Worked on migrating PIG scripts programs to Spark and Spark SQL to improve performance.
- Experience working with Cloudera Distribution Hadoop(CDH) and Hortonworks data platform(HDP).
- Extensively involved in writing Oracle, PL/SQL, stored procedures, functions and packages.
- Developed Spark code using Python/Scala and Spark-SQL for faster testing and processing of data
- Loaded data from different source (database & files) into Hive using Talend tool.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Worked on interviewing business users to gather requirements and documenting the requirements.
- Developed real time data processing applications by using Scala and Python and implemented Apache Spark Streaming from various streaming sources like Kafka.
- Used Flume to collect, aggregate, and store the web log data from different sources like web servers, mobile and network devices and pushed to HDFS.
- Imported and exported data into HDFS and Hive using Sqoop and Flume.
- Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Developed and maintained stored procedures, implemented changes to database design including tables.
- Ingested data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, HBase, and Hive.
- Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
- Continuously monitored and managed the Hadoop Cluster using Cloudera Manager
- Involved in reports development using reporting tools like Tableau.
- Loaded and transformed huge sets of structured, semi structured and unstructured data.
- Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio.
- Performed performance tuning of OLTP and Data warehouse environments using SQL.
- Created data structure to store the dimensions in an effective way to retrieve, delete and insert the data.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Implemented referential integrity using primary key and foreign key relationships.
- Developed Staging jobs where in using data from different sources like flat files, Excel files, Oracle database
Environment: HBase, Oozie 4.3, Hive 2.3, Sqoop 1.4, Pyspark, SDLC, OLTP, SSAS, SQL, Oracle 12c, PL/SQL, ETL, AWS, Sqoop, Flume
Confidential - Hartford, CT
Data Analyst/Data Engineer
Responsibilities:
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
- Designed the HBase schemes based on the requirements and HBase data migration and validation
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Worked on moving all log files generated from various sources to HDFS for further processing
- Wrote Hive with Scala scripts to analyze data according to business requirement.
- Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
- Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
- Created HBase tables to store various data formats of data coming from different sources.
- Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
- Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Developed SAS macros for data cleaning, reporting and to support routing processing.
- Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
- Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs
Environment: Erwin 9.5, SAS, SQL, HBase, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.
Confidential - SFO, CA
Data Analyst/Data Modeler
Responsibilities:
- Worked on translating the business requirements into detailed, production-level technical specifications.
- Involved in regular interactions with Business Analysts and participated in data modeling JAD sessions.
- Developed data mapping documents between Legacy, Production, and User Interface Systems.
- Involved in the creation, maintenance of Data Warehouse and repositories containing Metadata.
- Researched, evaluated, architect, and deployed new tools, frameworks and patterns to build sustainable Big Data platforms.
- Participated in meetings & JAD sessions to gather and collect requirements from the business end users.
- Designed and developed architecture for data services ecosystem spanning Relational, NoSQL, and Big Data technologies.
- Designed both 3NF Data models and dimensional Data models using Star and Snowflake schemas.
- Involved in Normalization/Denormalization techniques for optimum performance in relational and dimensional database environments.
- Developed Master data management strategies for storing reference data.
- Worked with Data Stewards and Business analysts to gather requirements for MDM Project.
- Designed logical, physical, relational and dimensional data models.
- Designed the Data Marts in dimensional data modeling using star and snowflake schemas.
- Involved in complete SSIS life cycle in creating SSIS packages, building, deploying and executing the packages all environments.
- Worked with data compliance teams, Data governance team to maintain data models, Metadata, Data Dictionaries define source fields and its definitions.
- Created Mappings, Mapplets, Sessions and Workflows to replace the existing Stored Procedures
- Handled performance requirements for databases in OLTP and OLAP models.
- Implemented Forward engineering to create tables, views and SQL scripts and mapping documents.
- Created Schema objects like Indexes, Views, and Sequences, triggers, grants, roles, Snapshots.
- Identified the entities and relationship between the entities to develop Conceptual Model using ER/Studio.
- Developed the Data warehouse model (Kimball's) with several data marts and conformed dimensions for the proposed model in the Project.
- Worked on attributes and relationships in the reverse engineered model to remove unwanted tables and columns.
- Created data masking mappings to mask the sensitive data between production and test environment.
Environment: ER/Studio v14, PL/SQL, SQL, OLTP, OLAP, 3NF, SSIS
Confidential
Data Analyst
Responsibilities:
- Worked with Data Analysts to understand Business logic and User Requirements.
- Closely worked with cross functional Data warehouse members to import data into SQL Server and connected to SQL Server to prepare spreadsheets.
- Created reports for the Data Analysis using SQL Server Reporting Services.
- Created V-Look Up functions in MS Excel for searching data in large spreadsheets.
- Created SQL queries to simplify migration progress reports and analyses.
- Wrote SQL queries using joins, grouping, nested sub-queries, and aggregation depending on data needed from various relational customer databases.
- Developed Stored Procedures in SQL Server to consolidate common DML transactions such as insert, update and delete from the database.
- Developed reporting and various dashboards across all areas of the client's business to help analyze the data.
- Cleansed and manipulated data by sub-setting, sorting, and pivoting on need basis.
- Used SQL Server and MS Excel on daily basis to manipulate the data for business intelligence reporting needs.
- Developed the stored procedures as required, and user defined functions and triggers as needed using T-SQL.
- Designed data reports in Excel, for easy sharing, and used SSRS for report deliverables to aid in statistical data analysis and decision making.
- Created reports from OLAP, sub reports, bar charts and matrix reports using SSIS.
- With V-lookups, Pivot tables, and Macros in Excel developed ad-hoc reports and recommended solutions to drive business decision making.
- Used Excel and PowerPoint on various projects as needed for presentations and summarization of data to provide insight on key business decisions.
- Designed Ad-hoc reports using SQL and Tableau dashboards, facilitating data driven decisions for business users.
- Extracted data from different sources performing Data Integrity and quality checks.
- Performed Data Analysis and Data Profiling and worked on data transformations and data quality rules.
- Involved in extensive data validation by writing several complex SQL queries and Involved in back-end testing and worked with data quality issues.
- Collected, analyze and interpret complex data for reporting and/or performance trend analysis
- Performed Data Manipulation using MS Excel Pivot Sheets and produced various charts for creating the mock reports.
Environment: SQL Server, MS Excel 2010, V-Look, T-SQL, SSRS, SSIS, OLAP, PowerPoint
