Big Data Engineer Resume
5.00/5 (Submit Your Rating)
PROFESSIONAL SUMMARY:
- 7 years of technical experience in Data analysis and Data Modeling business needs of clients, developing TEMPeffective and efficient solutions, and ensuring client deliverables wifin committed timelines.
- Experience in Python, Scala, C, SQL, and Distributed Systems Architecture and Parallel Processing Frameworks, as well as a thorough understanding of Distributed Systems Architecture and Parallel Processing Frameworks.
- Constructing and manipulating large datasets of structured, semi - structured, and unstructured data and supporting systems application architecture using tools like SAS, SQL, Python, R, MiniTab, PowerBI, and more to extract multi-factor interactions and drive change.
- Using the S3 CLI tools, create scripts for creating new snapshots and deleting existing snapshots in S3.
- Expertise wif Hive data warehouse architecture, including table creation, data distribution using Partitioning and Bucketing, and query development and tuning in Hive Query Language.
- Deep knowledge and strong deployment experience in Hadoop and Big Data ecosystems- HDFS, MapReduce, Spark, Pig, Sqoop, Hive, Oozie, Kafka, zookeeper, and HBase.
- Good understanding of client-server architecture for developing applications using TCP and UDP.
- Used Software methodologies like Agile, Scrum, TDD, and Waterfall.
- Experience in optimizing volumes, EC2 instances, created multiple VPC instances, created alarms, and notifications for EC2 instances using Cloud Watch.
- Good experience in developing desktop and web applications using Java, Spring
PROFESSIONAL EXPERIENCE:
Confidential
Big Data engineer
Responsibilities:
- Involved wif impact assessment in terms of schedule changes, dependency impact, code changes for various change requests on the existing Data Warehouse applications that running in a production environment. Developed Data Mapping, Data Governance, Transformation, and Cleansing rules for the
- Master Data Management Architecture involving OLTP, ODS, and OLAP. Worked on Performance Tuning of the database which includes indexes, optimizing SQL Statements. Created tables, views, sequences, triggers, tablespaces, constraints, and generated DDL scripts for physical implementation. Worked on Optimization of the application and the designing of the database tables wif the right partitioning keys using the DPF feature of hash partitioning and range partitioning. Used SQL for Querying the database in the UNIX environment. Experience in Project development and coordination wif onshore - offshore ETL/BI developers & Business Analysts. Performed various data analyses at the source level and determined the key attributes for designing Fact and Dimension tables using star schema for an TEMPeffective Data Warehouse and Data Mart. Used Model Mart of Erwin for TEMPeffective model management of sharing, dividing, and reusing model information and design for productivity improvement. Worked at conceptual/logical/physical data model level using Erwin according to requirements. Worked closely hand in hand wif the Business Analytics manager, who was also a part of the design/data modeling team
- Excellent Data Analytical / User interaction and Presentation Skills. Performed data analysis and data profiling using complex SQL on various sources systems including Oracle, SQL server, and DB2 Analyzed database performance wif SQL Profiler and Optimized indexes to significantly improve performance.
- Created logical and physical data models using Erwin and reviewed these models wif the business team and data architecture team. Performed data mining on data using very complex SQL queries and discovered patterns. Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel, and SQL. Prepared scripts for model and data migration from DB2 to the new Appliance environments. Ensured onsite to offshore transition, QA Processes, and closure of problems & issues. Experienced in creating UNIX scripts for file transfer and file manipulation.
Confidential
Sr. Data engineer
Responsibilities:
- Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T - SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing the data in In Azure Databricks. Designed and implemented scalable infrastructure and platform for large amounts of data ingestion, aggregation, integration, and analytics in Hadoop, including Spark, Hive, Hbase.
- Building distributed data scalable using Hadoop. Creating MapReduce programs to enable data for transformation, extraction, and aggregation of multiple formats like Avro, Parquet, XML, JSON, CSV, and other compressed file formats. Developing an architecture to move the project from Abinitio to py spark and scala spark. Use Python, Scala programming on a daily basis to perform transformations for applying business logic. Setting up Hbase column-based storage repository for archiving data on daily basis. Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write- back tool, and backward.
- Using Enterprise data lake to support various use cases including Analytics, Storing, and reporting of Voluminous, structured, and unstructured, rapidly changing data. Involved in designing, developing, testing, and documenting an application to combine personal loan, credit card, and mortgage from different countries and load data to Sybase database from hive database for Reporting insights. Implemented enterprise-grade platform (Mark logic) for ETL from mainframe to NoSQL (Cassandra). Writing Hive Queries in Spark-SQL for analysis and processing of the data. Using Sqoop to load data from HDFS, Hive, MySQL, and many other sources on daily basis. Creating job workflows using the Oozie schedule. Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team. Converting data load pipeline algorithms written in python and SQL to scala spark and pyspark.
Confidential
Data Engineer
Responsibilities:
- Involved in building scalable distributed data lake system for Confidential real time and batch analytical needs. Involved in designing, reviewing, optimizing data transformation processes using Apache Storm. Experience in job management using Fair Scheduling and Developed job processing scripts using
- Control - M workflow. Used Spark API over Cloudera Hadoop YARN toperform analytics on data in Hive. Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation,queries and writingdata back into OLTP systemthrough Scoop. Experienced in
- Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelismand memorytuning. Loaded the data into Spark RDD and do in memory data computation to generate the output response. Optimizing ofexisting algorithms in Hadoopusing Spark Context, Spark-SQL, Data
- Framesand Pair RDD's. Performed advanced procedures like text analytics and processing, using the in-memory computing capacities of Sparkusing Scala. Importeddata from Kafka Consumer into HBaseusing Sparkstreaming. Experienced in using Zookeeper and Oozie Operational Services for coordinating the cluster and scheduling workflows. Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such asJava MapReduce, Hiveand Sqoop aswellassystemspecific jobs. Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, TEMPEffectiveefficient Joins, Transformation and otherduringingestionprocessitself. Workedon migratinglegacy Map Reduceprogramsinto Sparktransformationsusing Sparkand Scala. Worked on a POC to compare processing time for Impala wif Apache Hive for batch applications to implement the former in project Workedextensively wif Sqoop for importingmetadata from Oracle.
Confidential
PL/SQL Developer
Responsibilities:
- Using SQL Navigator, created advanced PL/SQL packages, procedures, triggers, functions, indexes, and collections to implement business logic. For the remote instance, generated server - side PL/SQL scripts for data manipulation and validation, as well as materialized views. Involved in determining the main systems' process flow, workflow, and data flow. Exception was used to handle errors. Handling alot of stuff to make debugging and showing error warnings in the program easier. Built high-performance data integration solutions, including extraction, transformation, and load packages for data ware housing, utilizing SQL Server SSI Stool. Data was extracted from the XML file and entered the database.
- Testing all forms, PL/SQL code for logic correction. Data mapping from source to target database schemas, specification, and building data extract scripts/programming of data conversion in test and production settings were all part of the data analysis for data conversion. Extensive learning and development activities. All database objects, including tables, clusters, indexes, views, sequences, packages, and procedures, were administered. Involved in the full development cycle of Planning, Analysis, Design, Development, Testing, and Implementation. In Oracle, I designed and created all the tables and views for the system.
- Involved in building, debugging, and running forms. Involved in Data loading and Extracting functions using SQL*Loader. Developed PL/SQL triggers and master tables for automatic creation of primary keys. Designed and developed Oracle forms & reports generating up to 60 reports. Designing and developing forms validation procedures for query and update of data.