Sr. Big Data Engineer Resume Mt Laurel, NJ - Hire IT People

SUMMARY

Overall 9+ years of profession experience in Software Systems Development, Business Systems including designing and developing with Big Data Engineer/Data Modeler/Data Analyst.
Good experience in all phases of SDLC and participated in daily scrum meetings with cross teams.
Excellent experience in developing and designing data integration and migration solutions in MS Azure.
Excellent understanding and hands on experience with AWS, AWS S3 and EC2.
Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inman Approach.
Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
Good experience in using SSIS and SSRS in creating and managing reports for an organization.
Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
Participating in requirements sessions to gather requirements along with business analysts and product owners.
Experience in designing a component using UML Design - Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log.
Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
Strong experience in using Excel and MS Access to dump the data and analyze based on business needs.
Experience in designing the Data Mart and creation of Cubes.
Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
Experience in using SSIS in solving complex business problems.
Proficient in writing DDL, DML commands using SQL developer and Toad.
Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.

TECHNICAL SKILLS

Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11

Data Modeling Tools: Erwin r9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16

BI Tools: Tableau 10, SAP Business Objects, Crystal Reports

Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)

RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access

Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.

Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server

Version Tool: VSS, SVN, CVS.

OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9

Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack

Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED

Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.

ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.

PROFESSIONAL EXPERIENCE

Confidential - Mt Laurel, NJ

Sr. Big Data Engineer

Responsibilities:

As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
Used Agile Methodology of Data Warehouse development using Kanbanize.
Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
Integrated Oozie with MapReduce, Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
Collaborated with other data modeling team members to ensure design consistency and integrity.
Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
Worked on Apache Nifi as ETL tool for batch processing and real time processing.
Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Developed complete end to end Big-data processing in Hadoop eco system.
Involved in developing MapReduce framework, writing queries scheduling map-reduce
Developed customized classes for serialization and Deserialization in Hadoop.
Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
Worked closely with business analyst for requirement gathering and translating into technical documentation.
Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
Migrated Map Reduce jobs to Spark framework using Scala and rewritten most of the existing MapReduce jobs using Spark-Scala for performance.
Created external tables pointing to HBase to access table with huge number of columns.
Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
Extensively used Erwin for developing data model using star schema methodologies.
Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases
Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
Developed Spark code using Scala for faster testing and processing of data.
Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case.
Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.

Environment: Big Data, Hadoop 3.0, Agile, Apache Nifi, Hive 2.3, MapReduce, HDFS, Oracle 12c, Spark 2.3, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL.

Confidential - Lowell, AR

Data Engineer

Responsibilities:

Worked as a Data Engineer on several Hadoop Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
Worked on migrating PIG scripts and MapReduce programs to Spark and Spark SQL to improve performance.
Extensively involved in writing Oracle, PL/SQL, stored procedures, functions and packages.
Loaded data from different source (database & files) into Hive using Talend tool.
Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
Used all major ETL transformations to load the tables through Informatica mappings.
Worked on interviewing business users to gather requirements and documenting the requirements.
Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
Implemented a proof of concept deploying this product in Amazon Web Services AWS.
Developed and maintained stored procedures, implemented changes to database design including tables.
Ingested data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, and Hive.
Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
Involved in reports development using reporting tools like Tableau.
Loaded and transformed huge sets of structured, semi structured and unstructured data.
Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio.
Performed performance tuning of OLTP and Data warehouse environments using SQL.
Created data structure to store the dimensions in an effective way to retrieve, delete and insert the data.
Used Oozie to orchestrate the MapReduce jobs that extract the data on a timely manner.
Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
Implemented referential integrity using primary key and foreign key relationships.
Developed Staging jobs where in using data from different sources like flat files, Excel files, Oracle database

Environment: HBase, Sqoop 1.4, Oozie 4.3, Hive 2.3, Sqoop 1.4, SDLC, MapReduce, OLTP, SSAS, SQL, Oracle 12c, PL/SQL, ETL, AWS

Confidential - Hartford, CT

Data Analyst/Data Engineer

Responsibilities:

Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
Designed the HBase schemes based on the requirements and HBase data migration and validation
Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
Worked on moving all log files generated from various sources to HDFS for further processing
Wrote Hive with Scala scripts to analyze data according to business requirement.
Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
Created HBase tables to store various data formats of data coming from different sources.
Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Developed SAS macros for data cleaning, reporting and to support routing processing.
Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs

Environment: Erwin 9.5, SAS, SQL, HBase, MapReduce, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.

We provide IT Staff Augmentation Services!

Sr. Big Data Engineer Resume

Mt Laurel, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship