- Overall 9+ years of profession experience in Software Systems Development, Business Systems including designing and developing with Big Data Engineer/Data Modeler/Data Analyst.
- Good experience in all phases of SDLC and participated in daily scrum meetings with cross teams.
- Excellent experience in developing and designing data integration and migration solutions in MS Azure.
- Excellent understanding and hands on experience with AWS, AWS S3 and EC2.
- Expert in building Enterprise Data Warehouse or Data warehouse appliances from Scratch using both Kimball and Inman Approach.
- Experience on importing and exporting data using stream processing platforms like Flume and Kafka.
- Good experience in using SSIS and SSRS in creating and managing reports for an organization.
- Proficient knowledge in Designing and implementing data structures and commonly used data business intelligence tools for data analysis.
- Extensive experience in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
- Excellent working with data modeling tools like Erwin, Power Designer and ER Studio.
- Proficient working experience on big data tools like Hadoop, Azure Data Lake, AWS Redshift.
- Strong experience in Data Migration, Data Cleansing, Transformation, Integration, Data Import, and Data Export.
- Excellent technical and analytical skills with clear understanding of design goals and development for OLTP and dimension modeling for OLAP.
- Strong experience in migrating data warehouses and databases into Hadoop/NoSQL platforms.
- Designing and Developing Oracle PL/SQL and Shell Scripts, Data Conversions and Data Cleansing.
- Participating in requirements sessions to gather requirements along with business analysts and product owners.
- Experience in designing a component using UML Design - Use Case, Class, Sequence, and Development, Component diagrams for the requirements.
- Experience on implementation of a log producer in Scala that watches for application logs, transform incremental log.
- Extensive experience in writing UNIX shell scripts and automation of the ETL processes using UNIX shell scripting.
- Strong experience in using Excel and MS Access to dump the data and analyze based on business needs.
- Experience in designing the Data Mart and creation of Cubes.
- Experience in Data transformation, Data mapping from source to target database schemas, Data Cleansing procedures.
- Performing extensive data profiling and analysis for detecting and correcting inaccurate data from the databases and to track data quality.
- Experience in Performance Tuning and query optimization techniques in transactional and Data Warehouse Environments.
- Experience in using SSIS in solving complex business problems.
- Proficient in writing DDL, DML commands using SQL developer and Toad.
- Expertise in performing User Acceptance Testing (UAT) and conducting end user training sessions.
Big Data & Hadoop Ecosystem: Hadoop 3.0, HBase 1.2, Hive 2.3, Pig 0.17, Solr 7.2, Flume 1.8, Sqoop 1.4, Kafka 1.0.1, Oozie 4.3, Hue, Hadoop 3.0, Cassandra 3.11
Data Modeling Tools: Erwin r9.7, Rational System Architect, IBM Info sphere Data Architect, ER Studio v16
BI Tools: Tableau 10, SAP Business Objects, Crystal Reports
Methodologies: Agile, SDLC, Ralph Kimball data warehousing methodology, Joint Application Development (JAD)
RDBMS: Microsoft SQL Server 2017, Teradata 15.0, Oracle 12c, and MS Access
Operating Systems: Microsoft Windows 7/8 and 10, UNIX, and Linux.
Packages: Microsoft Office 2019, Microsoft Project, SAP and Microsoft Visio 2019, Share point Portal Server
Version Tool: VSS, SVN, CVS.
OLAP Tools: Tableau, SAP BO, SSAS, Business Objects, and Crystal Reports 9
Cloud Platform: AWS, Azure, Google Cloud, Cloud Stack/Open Stack
Programming Languages: SQL, PL/SQL, UNIX shell Scripting, PERL, AWK, SED
Databases: Oracle 12c/11g, Teradata R15/R14, MS SQL Server 2016/2014, DB2.
ETL/Data warehouse Tools: Informatica 9.6/9.1, SAP Business Objects XIR3.1/XIR2, Talend, Tableau, and Pentaho.
Confidential - Mt Laurel, NJ
Sr. Big Data Engineer
- As a Big Data Engineer developed Big Data analytic solutions on a Hadoop-based platform and engage clients in technical discussions.
- Responsible for building scalable distributed data solutions using Big Data technologies like Apache Hadoop, MapReduce, Shell Scripting, Hive.
- Worked with Microsoft Azure Cloud services, Storage Accounts, Azure date storage and Azure Data Factory.
- Used Agile Methodology of Data Warehouse development using Kanbanize.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS.
- Integrated Oozie with MapReduce, Pig, Hive, and Sqoop and developed Oozie workflow for scheduling and orchestrating the ETL process within the Cloudera Hadoop.
- Collaborated with other data modeling team members to ensure design consistency and integrity.
- Wrote complex Hive queries to extract data from heterogeneous sources (Data Lake) and persist the data into HDFS.
- Worked on Apache Nifi as ETL tool for batch processing and real time processing.
- Worked on a POC to perform sentiment analysis of twitter data using spark-streaming.
- Extracted data from Oracle and upload to Teradata tables using Teradata utilities FASTLOAD & Multiload.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Developed complete end to end Big-data processing in Hadoop eco system.
- Involved in developing MapReduce framework, writing queries scheduling map-reduce
- Developed customized classes for serialization and Deserialization in Hadoop.
- Worked closely with the SSIS, SSRS Developers to explain the complex data transformation using Logic.
- Worked closely with business analyst for requirement gathering and translating into technical documentation.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Worked in MongoDB and UNIX environment to non-SQL data clean-up grouping and create the analysis reports.
- Continuously tuned Hive UDF's for faster queries by employing partitioning and bucketing.
- Migrated Map Reduce jobs to Spark framework using Scala and rewritten most of the existing MapReduce jobs using Spark-Scala for performance.
- Created external tables pointing to HBase to access table with huge number of columns.
- Extensively used Pig for data cleansing using Pig scripts and Embedded Pig scripts.
- Performed advanced procedures like text analytics and processing using the in-memory computing capabilities of Spark.
- Worked on Cassandra for retrieving data from Cassandra clusters to run queries.
- Extensively used Erwin for developing data model using star schema methodologies.
- Maintained MySQL databases creation and setting up the users and maintain the backup of cluster metadata databases
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in CSV format.
- Developed, planed and migrated servers, relational databases (SQL) and websites to Microsoft Azure.
- Used Spark streaming to receive real time data from the Kafka and store the stream data to HDFS using Scala and NoSQL databases such as HBase and Cassandra.
- Documented the requirements including the available code which should be implemented using Spark, Hive, HDFS, HBase and Elastic Search.
- Developed Spark code using Scala for faster testing and processing of data.
- Explored MLlib algorithms in Spark to understand the possible Machine Learning functionalities that can be used for our use case.
- Generated multiple ad-hoc Python tools and scripts to facilitate map generation and data manipulation.
Environment: Big Data, Hadoop 3.0, Agile, Apache Nifi, Hive 2.3, MapReduce, HDFS, Oracle 12c, Spark 2.3, HBase 1.2, Flume 1.8, Pig 0.17, Sqoop 1.4, Oozie 4.3, SSIS, SSRS, SQL, PL/SQL, Cassandra 3.11, MongoDB, ETL.
Confidential - Lowell, AR
- Worked as a Data Engineer on several Hadoop Ecosystem components like HBase, Sqoop, Oozie, Hive and Pig with Cloudera Hadoop distribution.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Worked on migrating PIG scripts and MapReduce programs to Spark and Spark SQL to improve performance.
- Extensively involved in writing Oracle, PL/SQL, stored procedures, functions and packages.
- Loaded data from different source (database & files) into Hive using Talend tool.
- Worked with NoSQL databases like HBase in creating tables to load large sets of semi structured data coming from source systems.
- Used all major ETL transformations to load the tables through Informatica mappings.
- Worked on interviewing business users to gather requirements and documenting the requirements.
- Used Pattern matching algorithms to recognize the customer across different sources and built risk profiles for each customer using Hive and stored the results in HBase.
- Implemented a proof of concept deploying this product in Amazon Web Services AWS.
- Developed and maintained stored procedures, implemented changes to database design including tables.
- Ingested data from various sources and processing the Data-at-Rest utilizing Big Data technologies such as Hadoop, MapReduce Frameworks, HBase, and Hive.
- Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator.
- Worked with AWS to implement the client-side encryption as Dynamo DB does not support at rest encryption at this time.
- Provided thought leadership for architecture and the design of Big Data Analytics solutions for customers, actively drive Proof of Concept (POC) and Proof of Technology (POT) evaluations and to implement a Big Data solution.
- Created Integration Relational 3NF models that can functionally relate to other subject areas and responsible to determine transformation rules accordingly in the Functional Specification Document.
- Involved in reports development using reporting tools like Tableau.
- Loaded and transformed huge sets of structured, semi structured and unstructured data.
- Developed and Implemented logical and physical data models using enterprise modeling tools Erwin.
- Created Hive queries and tables that helped line of business identify trends by applying strategies on historical data before promoting them to production.
- Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
- Wrote Hive join query to fetch info from multiple tables, writing multiple MapReduce jobs to collect output from Hive.
- Designed and developed cubes using SQL Server Analysis Services (SSAS) using Microsoft Visual Studio.
- Performed performance tuning of OLTP and Data warehouse environments using SQL.
- Created data structure to store the dimensions in an effective way to retrieve, delete and insert the data.
- Used Oozie to orchestrate the MapReduce jobs that extract the data on a timely manner.
- Used Hive to analyze data ingested into HBase by using Hive-HBase integration and compute various metrics for reporting on the dashboard
- Implemented referential integrity using primary key and foreign key relationships.
- Developed Staging jobs where in using data from different sources like flat files, Excel files, Oracle database
Environment: HBase, Sqoop 1.4, Oozie 4.3, Hive 2.3, Sqoop 1.4, SDLC, MapReduce, OLTP, SSAS, SQL, Oracle 12c, PL/SQL, ETL, AWS
Confidential - Hartford, CT
Data Analyst/Data Engineer
- Worked as a Sr. Data Analyst/Data Engineer to review business requirement and compose source to target data mapping documents.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig
- Designed the HBase schemes based on the requirements and HBase data migration and validation
- Involved in migration of data from existing RDBMS (oracle and SQL server) to Hadoop using Sqoop for processing data.
- Developed numerous MapReduce jobs in Scala for Data Cleansing and Analyzing Data in Impala.
- Involved in PL/SQL query optimization to reduce the overall run time of stored procedures.
- Connected to Amazon Redshift through Tableau to extract live data for real time analysis.
- Responsible for developing data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Imported the data from different sources like HDFS/HBase into Spark RDD and developed a data pipeline using Kafka and Storm to store data into HDFS.
- Apache Hadoop installation & configuration of multiple nodes on AWS EC2 system
- Worked on moving all log files generated from various sources to HDFS for further processing
- Wrote Hive with Scala scripts to analyze data according to business requirement.
- Generate metadata, create Talend jobs, mappings to load data warehouse, Data Lake.
- Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS.
- Translated business requirements into working logical and physical data models for OLTP & OLAP systems.
- Optimized the performance of queries with modification in T-SQL queries, established joins and created clustered indexes.
- Created HBase tables to store various data formats of data coming from different sources.
- Created the system for single truth of source on Hadoop file system (HDFS), while enabling transparent data movement and access at various layers.
- Written Hive queries to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Developed SAS macros for data cleaning, reporting and to support routing processing.
- Embed SQL queries in Excel and used Excel functions to calculate parameters like standard deviation,
- Performed Data Analysis, Statistical Analysis, Generated Reports and Listing using SAS/SQL, SAS/ACCESS and SAS/EXCEL, Pivot Tables and Graphs
Environment: Erwin 9.5, SAS, SQL, HBase, MapReduce, Scala, T-SQL, AWS, Oozie, Hive 1.9, HDFS, PL/SQL, Excel.