We provide IT Staff Augmentation Services!

Data Engineer/big Data Developer Resume

3.00/5 (Submit Your Rating)

SUMMARY

  • IT experience of over 8+ years of experience with multinational Clients includes 4+ years in Bigdata ecosystem components like Apache Spark, Hadoop, Hive, HBase, SQL, Sqoop,
  • Hands - on fundamental building blocks of Spark - RDDs and related manipulations for implementing business logic Like Transformations, Actions, and Functions performed on RDD.
  • Experience in handling various tickets/incidents related to stringent SLA and / providing 24*7 support to critical production environments
  • This support would include file system management and monitoring, cluster monitoring and management and automating / scripting backups and restores
  • Perform root cause analysis and identify and implement corrective and preventive measures
  • Document standards, processes and procedures relating to best practices, issues and resolutions
  • Experienced in real-time Big Data solutions using Hbase, handling billions of records.
  • Experienced in importing and exporting data from RDBMS into HDFS using Sqoop.
  • Experience with Azure transformation projects and Azure architecture decision making Architect and implement ETL and data movement solutions using Azure Data Factory(ADF), SSIS.
  • Good working knowledge on processing Batch applications.
  • Experienced in writing Map Reduce programs and UDFs in Hive.
  • Depth understanding of Data-frames and Datasets in Spark SQL
  • Exposure on usage of Apache Kafka to develop data pipeline of logs as a stream of messages using producers and consumers.
  • Hands-on working in GCP (Google Cloud Platform) and good knowledge of GCP as a storage mechanism.
  • Good experience in Hive partitioning, bucketing and perform different types of joins on Hive tables and implementing Hive SerDe like JSON and Avro.
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB)
  • Good knowledge of Data Warehousing, ETL development, Distributed Computing and large-scale data processing.
  • Experience in Agile Methodology Using JIRA Scrum tool.
  • Collaborated with the infrastructure, network, database, application, and BI teams to ensure data quality and availability.
  • Strong knowledge of Software Development Life Cycle and expertise in detailed design documentation.
  • Excellent Communication Skills, Ability to perform at a high level and meet deadlines.
  • Developed spark applications in python(PySpark) on distributed environment to load huge number of CSV files with different schema in to Hive ORC tables.
  • Worked on reading and writing multiple data formats like JSON,ORC,Parquet on HDFS using PySpark.
  • Analysed the sql scripts and designed it by using PySpark SQL for faster performance.
  • Optimization of Hive queries using best practices and right parameters and using technologies like Hadoop, YARN, Python, PySpark.

TECHNICAL SKILLS

Big Data: Hadoop, Hive, Apache Spark, Spark SQL, Zookeeper, Data Factory, Sqoop, Hue, HBase, MySQL, Thought spot, SQL, Apace Kafka, Big ID, Automic

Data Warehousing: Informatica Power Center 9.x/8.x/7.x, Informatica Cloud, Talend Open studio & Integration suite, Azure SQL Analytics

Languages: Python, Shell Scripting, Selenium

Database: MySQL, Oracle, Microsoft SQL Server

IDE / Testing Tools: Eclipse, IntelliJ IDEA

Operating System: Windows, UNIX, Linux

Cloud Computing: AWS, Azure, Rackspace, Open Stack

SDLC Methodologies: Agile/Scrum, Waterfall

Scripting Languages: Unix, Python, Windows Power Shell

RDBMS Utility: Toad, SQL Plus, SQL Loader

PROFESSIONAL EXPERIENCE

Confidential

Data Engineer/Big Data Developer

Responsibilities:

  • Worked with the Hive for improving the performance and optimization in Hadoop using components.
  • Developed custom aggregate functions using Spark SQL and performed interactive querying
  • Designed good understanding of Partitions, bucketing concepts in Hive, and designed both Managed and External tables in Hive to optimize performance
  • Created Hive external tables, views, and scripts for transformations such as filtering, aggregation, and partitioning tables.
  • Responsible for monitoring the Production Status, and ensure the ETL process works as expected, handle customer communication around production issues
  • In depth knowledge of hardware sizing, capacity management, cluster management and maintenance, performance monitoring and configuration
  • Respond to system generated alerts/escalations relating to any failures on application platform.
  • Design and implement database solutions in Azure SQL Data Warehouse, Azure SQL
  • Architect & implement medium to large scale BI solutions on Azure using Azure Data Platform services (Azure Data Lake, Data Factory, Data Lake Analytics, Stream Analytics, Azure SQL DW, HDInsight/Databricks, NoSQL DB).
  • Handled importing of data from various data sources, performed transformations using Hive and loaded data into Teradata to HDFS.
  • Migrated On prem informatica ETL process to AWS cloud and Snowflakes.
  • Expert in performing business analytical scripts using HiveSQL.
  • Worked with the Python for improving the Log files and optimizing clear results.
  • Developed custom aggregate functions using Python Dictionaries
  • Used different python functions like RegEx, findall, search, split and sub
  • Worked on automation using Selenium with Python
  • Followed agile methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
  • Gained very good business knowledge on a different category of products and designs within.
  • Involved in developing Thought spot reports and workflows automated to load data
  • Implemented continuous integration & deployment (CICD) through Jenkins for Hadoop jobs

Environment: HDFS, Hive, Sqoop, SQL, Zookeeper, Horton works, Hue, LINUX, Big Data, UNIX Shell Scripting, Spark, Putty, Thought-spot, Aorta framework

Confidential

Big Data developer

Responsibilities:

  • Involved in requirement gathering to connect with Business Analysis.
  • Responsible for creating technical Documents like High-Level Design and low-Level Design specifications.
  • Installed and configured Cloudera Manager for easy management of existing Hadoop cluster
  • Configured various property files like core-site.xml, hdfs-site.xml, yarn-site.xml, mapred- site.xml and Hadoop-env.xml based upon the job requirement.
  • Used Sqoop to transfer data between RDBMS and HDFS.
  • Involved in ETL architecture enhancements to increase the performance using query optimizer.
  • Worked with business functional lead to review and finalize requirements and data profiling analysis.
  • Implemented complex Spark programs to perform Joins from Different tables
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's.
  • Created ETL metadata reports using SSRS, reports include like execution times for the SSIS packages, Failure reports with error description.
  • Responsible for creating tables based on business requirements
  • Show data visualization and to generate reports for a clear result.
  • Loaded and transformed large sets of structured, semi-structured and unstructured datain various formats like text, XML and JSON.
  • Utilized Agile Scrum Methodology to help manage and organize a Project with the professor and regular code review sessions.

Environment: Hadoop HDFS, Apache Spark, Spark-Core, Spark-SQL, Scala, JDK 1.8, CHD 5, Sqoop, MySQL, CentOS Linux

Confidential

SQL Developer

Responsibilities:

  • Involved in database design to create new databases that maintained referential integrity.
  • Generated database SQL scripts and deployed databases including installation and configuration.
  • Created indexed views, and appropriate indexes to reduce the running time for complex queries.
  • Participated in designing a data warehouse to store all information from OLTP to Staging and Staging to Enterprise Data warehouse to do better analysis.
  • Used basic Script Task for renaming the file names, storing Row Count Values into User Variables.
  • Actively participated in designing the SSIS Package Model for initial load and incremental load for extracting, transforming, and loading data from and to various RDBMS and Non-RDBMS sources and destinations.
  • Analyzing the Data from different sourcing using Big Data Solution Hadoop by implementing Azure Data Factory, Azure Data Lake, Azure Data Lake Analytics, HDInsights, Hive, Sqoop.
  • Migration of on premise data (SQL Server) to Azure Data Lake Store(ADLS) using Azure Data Factory.
  • Documentation of all the processes involved in maintaining the database for future reference.
  • Performed unit tests was involved in deploying database objects to test/production environment.
  • Involved in ER diagrams and mapping the data into database objects, design of the Database and the Tables.
  • Created dashboards on data blending from different database and tables to meet the business requirements
  • Generated database SQL scripts and deployed databases including installation and configuration.
  • Experience in creating Indexes for faster performance and views for controlling user access to data.
  • Performed unit testing, provided bug fixes and deployment support.

Environment: SQL Server 2014, Visual Studio 2014, TFS, SSRS, SSIS, Waterfall methodology

We'd love your feedback!