Data Engineer/Big Data Developer Resume

SUMMARY

IT experience of over 8+ years of experience with multinational Clients includes 4+ years in Bigdata ecosystem components like Apache Spark, Hadoop, Hive, HBase, SQL, Sqoop,
Hands - on fundamental building blocks of Spark - RDDs and related manipulations for implementing business logic Like Transformations, Actions, and Functions performed on RDD.
Experience in handling various tickets/incidents related to stringent SLA and / providing 24*7 support to critical production environments
This support would include file system management and monitoring, cluster monitoring and management and automating / scripting backups and restores
Perform root cause analysis and identify and implement corrective and preventive measures
Document standards, processes and procedures relating to best practices, issues and resolutions
Experienced in real-time Big Data solutions using Hbase, handling billions of records.
Experienced in importing and exporting data from RDBMS into HDFS using Sqoop.
Experience with Azure transformation projects and Azure architecture decision making Architect and implement ETL and data movement solutions using Azure Data Factory(ADF), SSIS.
Good working knowledge on processing Batch applications.
Experienced in writing Map Reduce programs and UDFs in Hive.
Depth understanding of Data-frames and Datasets in Spark SQL
Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
Hands-on working in GCP (Google Cloud Platform) and good knowledge of GCP as a storage mechanism.
Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe like JSON and Avro.
Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
Good knowledge of Data Warehousing, ETL development, Distributed Computing and large-scale data processing.
Experience in Agile Methodology Using JIRA Scrum tool.
Collaborated with the infrastructure, network, database, application, and BI teams to ensure data quality and availability.
Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
Excellent Communication Skills, Ability to perform at a high level and meet deadlines.
Developed spark applications in python(PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
Worked on reading and writing multiple data formats like JSON,ORC,Parquet on HDFS using PySpark.
Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, PySpark.

TECHNICAL SKILLS

Big Data: Hadoop, Hive, Apache Spark, Spark SQL, Zookeeper, Data Factory, Sqoop, Hue, HBase, MySQL, Thought spot, SQL, Apace Kafka, Big ID, Automic

Data Warehousing: Informatica Power Center 9.x/8.x/7.x, Informatica Cloud, Talend Open studio & Integration suite, Azure SQL Analytics

Languages: Python, Shell Scripting, Selenium

Database: MySQL, Oracle, Microsoft SQL Server

IDE / Testing Tools: Eclipse, IntelliJ IDEA

Operating System: Windows, UNIX, Linux

Cloud Computing: AWS, Azure, Rackspace, Open Stack

SDLC Methodologies: Agile/Scrum, Waterfall

Scripting Languages: Unix, Python, Windows Power Shell

RDBMS Utility: Toad, SQL Plus, SQL Loader

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer/Big Data Developer

Responsibilities:

Worked with the Hive for improving the performance and optimization in Hadoop using components.
Developed custom aggregate functions using Spark SQL and performed interactive querying
Designed good understanding of Partitions, bucketing concepts in Hive, and designed both Managed and External tables in Hive to optimize performance
Created Hive external tables, views, and scripts for transformations such as filtering, aggregation, and partitioning tables.
Responsible for monitoring the Production Status, and ensure the ETL process works as expected, handle customer communication around production issues
In depth knowledge of hardware sizing, capacity management, cluster management and maintenance, performance monitoring and configuration
Respond to system generated alerts/escalations relating to any failures on application platform.
Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
Handled importing of data from various data sources, performed transformations using Hive and loaded data into Teradata to HDFS.
Migrated On prem informatica ETL process to AWS cloud and Snowflakes.
Expert in performing business analytical scripts using HiveSQL.
Worked with the Python for improving the Log files and optimizing clear results.
Developed custom aggregate functions using Python Dictionaries
Used different python functions like RegEx, findall, search, split and sub
Worked on automation using Selenium with Python
Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
Gained very good business knowledge on a different category of products and designs within.
Involved in developing Thought spot reports and workflows automated to load data
Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs

Environment: HDFS, Hive, Sqoop, SQL, Zookeeper, Horton works, Hue, LINUX, Big Data, UNIX Shell Scripting, Spark, Putty, Thought-spot, Aorta framework

Confidential

Big Data developer

Responsibilities:

Involved in requirement gathering to connect with Business Analysis.
Responsible for creating technical Documents like High-Level Design and low-Level Design specifications.
Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred- site.xml and Hadoop-env.xml based upon the job requirement.
Used Sqoop to transfer data between RDBMS and HDFS.
Involved in ETL architecture enhancements to increase the performance using query optimizer.
Worked with business functional lead to review and finalize requirements and data profiling analysis.
Implemented complex Spark programs to perform Joins from Different tables
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
Created ETL metadata reports using SSRS, reports include like execution times for the SSIS packages, Failure reports with error description.
Responsible for creating tables based on business requirements
Show data visualization and to generate reports for a clear result.
Loaded and transformed large sets of structured, semi-structured and unstructured datain various formats like text, XML and JSON.
Utilized Agile Scrum Methodology to help manage and organize a Project with the professor and regular code review sessions.

Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.8, CHD 5, Sqoop, MySQL, CentOS Linux

Confidential

SQL Developer

Responsibilities:

Involved in database design to create new databases that maintained referential integrity.
Generated database SQL scripts and deployed databases including installation and configuration.
Created indexed views, and appropriate indexes to reduce the running time for complex queries.
Participated in designing a data warehouse to store all information from OLTP to Staging and Staging to Enterprise Data warehouse to do better analysis.
Used basic Script Task for renaming the file names, storing Row Count Values into User Variables.
Actively participated in designing the SSIS Package Model for initial load and incremental load for extracting, transforming, and loading data from and to various RDBMS and Non-RDBMS sources and destinations.
Analyzing the Data from different sourcing using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Data Lake Analytics, HDInsights, Hive, Sqoop.
Migration of on premise data (SQL Server) to Azure Data Lake Store(ADLS) using Azure Data Factory.
Documentation of all the processes involved in maintaining the database for future reference.
Performed unit tests was involved in deploying database objects to test/production environment.
Involved in ER diagrams and mapping the data into database objects, design of the Database and the Tables.
Created dashboards on data blending from different database and tables to meet the business requirements
Generated database SQL scripts and deployed databases including installation and configuration.
Experience in creating Indexes for faster performance and views for controlling user access to data.
Performed unit testing, provided bug fixes and deployment support.

Environment: SQL Server 2014, Visual Studio 2014, TFS, SSRS, SSIS, Waterfall methodology

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship