Big Data engineer Resume

PROFESSIONAL SUMMARY:

7 years of technical experience in Data analysis and Data Modeling business needs of clients, developing effective and efficient solutions, and ensuring client deliverables within committed timelines.
Experience in Python, Scala, C, SQL, and Distributed Systems Architecture and Parallel Processing Frameworks, as well as a thorough understanding of Distributed Systems Architecture and Parallel Processing Frameworks.
Constructing and manipulating large datasets of structured, semi - structured, and unstructured data and supporting systems application architecture using tools like SAS, SQL, Python, R, MiniTab, PowerBI, and more to extract multi-factor interactions and drive change.
Using teh S3 CLI tools, create scripts for creating new snapshots and deleting existing snapshots in S3.
Expertise with Hive data warehouse architecture, including table creation, data distribution using Partitioning and Bucketing, and query development and tuning in Hive Query Language.
Deep knowledge and strong deployment experience in Hadoop and Big Data ecosystems- HDFS, MapReduce, Spark, Pig, Sqoop, Hive, Oozie, Kafka, zookeeper, and HBase.
Good understanding of client-server architecture for developing applications using TCP and UDP.
Used Software methodologies like Agile, Scrum, TDD, and Waterfall.
Experience in optimizing volumes, EC2 instances, created multiple VPC instances, created alarms, and notifications for EC2 instances using Cloud Watch.
Good experience in developing desktop and web applications using Java, Spring

PROFESSIONAL EXPERIENCE:

Confidential

Big Data engineer

Responsibilities:

Involved with impact assessment in terms of schedule changes, dependency impact, code changes for various change requests on teh existing Data Warehouse applications dat running in a production environment. Developed Data Mapping, Data Governance, Transformation, and Cleansing rules for teh
Master Data Management Architecture involving OLTP, ODS, and OLAP. Worked on Performance Tuning of teh database which includes indexes, optimizing SQL Statements. Created tables, views, sequences, triggers, tablespaces, constraints, and generated DDL scripts for physical implementation. Worked on Optimization of teh application and teh designing of teh database tables with teh right partitioning keys using teh DPF feature of hash partitioning and range partitioning. Used SQL for Querying teh database in teh UNIX environment. Experience in Project development and coordination with onshore - offshore ETL/BI developers & Business Analysts. Performed various data analyses at teh source level and determined teh key attributes for designing Fact and Dimension tables using star schema for an effective Data Warehouse and Data Mart. Used Model Mart of Erwin for effective model management of sharing, dividing, and reusing model information and design for productivity improvement. Worked at conceptual/logical/physical data model level using Erwin according to requirements. Worked closely hand in hand with teh Business Analytics manager, who was also a part of teh design/data modeling team
Excellent Data Analytical / User interaction and Presentation Skills. Performed data analysis and data profiling using complex SQL on various sources systems including Oracle, SQL server, and DB2 Analyzed database performance with SQL Profiler and Optimized indexes to significantly improve performance.
Created logical and physical data models using Erwin and reviewed these models with teh business team and data architecture team. Performed data mining on data using very complex SQL queries and discovered patterns. Performed data management projects and fulfilling ad-hoc requests according to user specifications by utilizing data management software programs and tools like Perl, Toad, MS Access, Excel, and SQL. Prepared scripts for model and data migration from DB2 to teh new Appliance environments. Ensured onsite to offshore transition, QA Processes, and closure of problems & issues. Experienced in creating UNIX scripts for file transfer and file manipulation.

Confidential

Sr. Data engineer

Responsibilities:

Extract Transform and Load data from Sources Systems to Azure Data Storage services using a combination of Azure Data Factory, T - SQL, Spark SQL, and U-SQL Azure Data Lake Analytics. Data Ingestion to one or more Azure Services - (Azure Data Lake, Azure Storage, Azure SQL, Azure DW) and processing teh data in In Azure Databricks. Designed and implemented scalable infrastructure and platform for large amounts of data ingestion, aggregation, integration, and analytics in Hadoop, including Spark, Hive, Hbase.
Building distributed data scalable using Hadoop. Creating MapReduce programs to enable data for transformation, extraction, and aggregation of multiple formats like Avro, Parquet, XML, JSON, CSV, and other compressed file formats. Developing an architecture to move teh project from Abinitio to py spark and scala spark. Use Python, Scala programming on a daily basis to perform transformations for applying business logic. Setting up Hbase column-based storage repository for archiving data on daily basis. Created Pipelines in ADF using Linked Services/Datasets/Pipeline/ to Extract, Transform, and load data from different sources like Azure SQL, Blob storage, Azure SQL Data warehouse, write- back tool, and backward.
Using Enterprise data lake to support various use cases including Analytics, Storing, and reporting of Voluminous, structured, and unstructured, rapidly changing data. Involved in designing, developing, testing, and documenting an application to combine personal loan, credit card, and mortgage from different countries and load data to Sybase database from hive database for Reporting insights. Implemented enterprise-grade platform (Mark logic) for ETL from mainframe to NoSQL (Cassandra). Writing Hive Queries in Spark-SQL for analysis and processing of teh data. Using Sqoop to load data from HDFS, Hive, MySQL, and many other sources on daily basis. Creating job workflows using teh Oozie schedule. Exported teh analyzed data into relational databases using Sqoop for visualization and to generate reports for teh BI team. Converting data load pipeline algorithms written in python and SQL to scala spark and pyspark.

Confidential

Data Engineer

Responsibilities:

Involved in building scalable distributed data lake system for Confidential real time and batch analytical needs. Involved in designing, reviewing, optimizing data transformation processes using Apache Storm. Experience in job management using Fair Scheduling and Developed job processing scripts using
Control - M workflow. Used Spark API over Cloudera Hadoop YARN toperform analytics on data in Hive. Developed Scala scripts, UDFs using both Data frames/SQL and RDD/MapReduce in Spark 1.6 for Data Aggregation,queries and writingdata back into OLTP systemthrough Scoop. Experienced in
Performing tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelismand memorytuning. Loaded teh data into Spark RDD and do in memory data computation to generate teh output response. Optimizing ofexisting algorithms in Hadoopusing Spark Context, Spark-SQL, Data
Framesand Pair RDD's. Performed advanced procedures like text analytics and processing, using teh in-memory computing capacities of Sparkusing Scala. Importeddata from Kafka Consumer into HBaseusing Sparkstreaming. Experienced in using Zookeeper and Oozie Operational Services for coordinating teh cluster and scheduling workflows. Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such asJava MapReduce, Hiveand Sqoop aswellassystemspecific jobs. Experienced in handling large datasets using partitions, Spark in Memory capabilities, Broadcasts in Spark, Effectiveefficient Joins, Transformation and otherduringingestionprocessitself. Workedon migratinglegacy Map Reduceprogramsinto Sparktransformationsusing Sparkand Scala. Worked on a POC to compare processing time for Impala with Apache Hive for batch applications to implement teh former in project Workedextensively with Sqoop for importingmetadata from Oracle.

Confidential

PL/SQL Developer

Responsibilities:

Using SQL Navigator, created advanced PL/SQL packages, procedures, triggers, functions, indexes, and collections to implement business logic. For teh remote instance, generated server - side PL/SQL scripts for data manipulation and validation, as well as materialized views. Involved in determining teh main systems' process flow, workflow, and data flow. Exception was used to handle errors. Handling a lot of stuff to make debugging and showing error warnings in teh program easier. Built high-performance data integration solutions, including extraction, transformation, and load packages for data ware housing, utilizing SQL Server SSI Stool. Data was extracted from teh XML file and entered teh database.
Testing all forms, PL/SQL code for logic correction. Data mapping from source to target database schemas, specification, and building data extract scripts/programming of data conversion in test and production settings were all part of teh data analysis for data conversion. Extensive learning and development activities. All database objects, including tables, clusters, indexes, views, sequences, packages, and procedures, were administered. Involved in teh full development cycle of Planning, Analysis, Design, Development, Testing, and Implementation. In Oracle, I designed and created all teh tables and views for teh system.
Involved in building, debugging, and running forms. Involved in Data loading and Extracting functions using SQL*Loader. Developed PL/SQL triggers and master tables for automatic creation of primary keys. Designed and developed Oracle forms & reports generating up to 60 reports. Designing and developing forms validation procedures for query and update of data.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship