Hadoop Developer\Admin Resume Plano, Texas - Hire IT People

PROFESSIONAL SUMMARY

Around 9 years of IT experience as Database Administration & Development with around 5+ years of experience in the Big Data with Hadoop and its Ecosystem in Development, Test and Production Environment on various business domains like Finance, Health, Insurance and Telecommunication using scala, python and SQL.
Hands on experience on Hadoop and its major components, Ecosystem like CLOUDERA MANAGER, HortonWorks Ambari, HDFS, HIVE, PIG, SPARK SQL, Spark Data Framework, Spark Dataset, Oozie, Sqoop, Map Reduce and YARN.
Excellent understanding of Autosys, Git/Github, Bit bucket, Hadoop architecture and its components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and Map Reduce programming paradigm.
Hands on experience in installing, configuring and using Hadoop Ecosystem - HDFS, MapReduce, Hive, pig, Hbase, Spark, Sqoop, Flume and Oozie.
Hands on experience using Cloudera, Horton Works and Apache Hadoop distributions.
Ingested the data from RDMS and various file system to HDFS using autosys job, Sqoop, and sqoop job, Bash Scripting, UNIX command line utilities.
Data was ingested into the system using Kafka that receives data from various database providers onto HDFS for analysis and data processing.
Importing and exporting data job's, to perform operations like copying data from HDFS and to HDFS using Sqoop.
Experience with complex data processing pipelines, including ETL and data ingestion dealing with structural, unstructured and semi-structural data.
Exposure to Cloudera development environment and management using Cloudera Manager.
Strong understanding of various Hadoop services, Map Reduce and YARN architecture.
Experience with Spark, Spark SQL and Spark Streaming.
Experience with various performance optimization like using distributed cache for small dataset, partitioning, bucketing in hive and map side joins when writing map reduce jobs
Hands on experience loading data to Hive partitions using different condition and managing buckets in Hive as well.
Design and developed MR jobs and autosys job for transfering the data from RDBMS.
Excellent understanding of NoSQL databases like HBase, MongoDB & Cassandra.
Understanding of Oozie to schedule all Hive/sqoop/Hbase jobs.
Experience in Microsoft cloud and setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
Hands on expertise in real time analytics with Apache Spark (RDD, Data Frames and Streaming API).
Experience for using RDD lineage to reconstruct lost data, when partition is lost.
Used Spark Data Frames API over Cloudera platform to perform analytics on Hive data.
Real-time experience in integrating Hadoop with Apach Kafka. Expertise in uploading Click stream data from Kafka to HDFS, HBase and Hive by integrating with Storm.
Background with traditional databases such as SQL Server, Teradata, MySQL.
Extensive experience in MSSQL database administration, management, migration, upgrade, production, and design with MS SQL Server2016/ 2014/2012(with always on high availability and Azure SQL)2008r2/2005 using SSMS, SSIS, SSRS in Multiple industries.
Highly experienced in T-SQL, Database Designing, implementation, Deployment and System Administration using Microsoft SQL Server as well as Windows 2003, 2008, 2012 and 2012r2 with VMware Virtualization.
Highly experience in Data Migration between SQL to Azure and SQL to SQL by using side-by-side upgrade process Management.
Expertise in implementing and maintaining efficient Disaster Recovery Strategies and High availability like Replication (Snapshot and Transactional with updatable subscriptions), Log shipping, Clustering with VMware and Hyper-V (Active-Active and Active Passive).
Experience in troubleshooting SQL issues by using SQL Tools Execution plan, Trace.
Experience in monitoring SQL server, capturing performance problem and improve performance by tuning Databases.
Experience in creating and maintaining Backup and Restore, Mirroring and Log-Shipping strategy as a part of Disaster recovery.
Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills.

TECHNICAL SKILLS:

Database: Hadoop HDFS and Map reduce, sqoop, Hive, Spark and Pig, Kafka, Oozie, MS SQL Server 2005/2008/2012/2014/2016, Teradata, MySQL

Language: T-SQL, PL/SQL, Bash scripting, Linux/Unix, CMD, Java, python and Scala.

PROFESSIONAL EXPERIENCE

Confidential, Plano, Texas

Hadoop Developer\Admin

Responsibilities:

Prepared Test Plan, Cases, Test scripts and Test Metrics for the application as well as for the database verification based on the functional requirements, portal redesigned and Test specs.
Configured automation with Autosys, Using jira, Git/Github, Bit bucket, and working for Hadoop architecture and its components such as HDFS, Flowboot, Job Tracker, Task Tracker, Name Node, Data Node, Resource Manager, Node Manager and Map Reduce programming paradigm.
Worked to gather requirement, Business Analysis and translated business requirement into technical design in Hadoop and Big data platform.
Develop Design and Modeling multi source data in different platform like Hive, spark sql, spark data Frame and data set.
Created schema, UDF, tables, views, function, partitions, Bucket and maintain relation using indexes and keys.
Configure and working with multi nodes Hadoop cluster, Installed Cloudera, Apache Hadoop, Hive, Pig and Spark and commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Core responsible for managing and analyzing Hadoop cluster and different big data platform using shell script, bash script and different analytic tools including Pig, Hive and Spark SQL.
Responsible for building scalable distributed data solutions using Hadoop.
Data moving into HDFS from different database like Teradata, MSSQL, MYSQL and vice versa using SQOOP .
Scheduling different jobs using Fair scheduler.
Create automate job using bash script for loading data from Teradata and Local file system to HDFS and HDFS to LINUX file system.
Involved in scheduling Oozie workflow engine to run multiple MR, Hive and Pig job.
Used kafka for messaging and subscribing to topic, where the producer produce a topic and consumer consumes the data via subscription.
Implemented test scripts to support test driven development and continuous integration.
Working on troubleshooting, monitoring, tuning the performance of Map reduce Jobs.
Responsible to migrate from lower version to latest version of Cloudera or Hortonwork.
Moving the data by implementing ETL processes using Hive with large sets of structured and semi structured data.
Experience to analyze and reviewing Hadoop log files, autosys log file and system log file.
Using PIG predefined functions to convert the fixed width file to delimited file.
Responsible for Optimizing and tuning Hive, Pig and Spark to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation.
Involved in scheduling Oozie workflow engine to run multiple MR, Hive and Pig job.
Managing different RDBM’s and Hadoop cluster, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files and manual fail over to check cluster functioning.
Made transaction for Spark code using Scala and Spark-SQL for faster testing and data processing within hadoop ecosystem.
Develop Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
Export and Import the data from different sources like HDFS/HBase into SparkRDD and spark to different scores.
Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Close Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.

Environment: Hadoop, ClouderaManager, HortonsWorkAmbari, Teradata, HDFS, MapReduce, Spark SQL, Hive, Pig, Sqoop, Bash scripting, Java, Scala, ubunto,Linux RedHat. Multi node Hadoop cluster.

Confidential, White Plain, New York

Hadoop Developer/Admin

Responsibilities

Involved in requirement gathering, Business Analysis and translated business requirement into technical design in Hadoop and Big data
Configure and working with multi nodes Hadoop cluster, Installed Cloudera, Apache Hadoop, Hive, Pig and Spark and commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Responsible for analyzing Hadoop cluster and different big data analytic tools including Pig, Hive and Spark.
Importing and exporting data into HDFS from different database and vice versa using SQOOP .
loading data from Local file system to HDFS and HDFS to LINUX file system.
Perform architecture design, data modeling, and implementation of SQL, Big Data platform and analytic applications for the consumer products.
Implemented test scripts to support test driven development and continuous integration.
Working on troubleshooting, monitoring, tuning the performance of Map reduce Jobs.
Responsible to manage data coming from different sources.
Transformed the data by applying ETL processes using Hive with large sets of structured and semi structured data.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports using SQL BID or Data Tools.
Managing different jobs using Fair scheduler.
Using PIG predefined functions to convert the fixed width file to delimited file.
Responsible for Optimizing and tuning Hive, Pig and Spark to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation.
Querying Spark code using Scala and Spark-SQL for faster testing and data processing.
Export and Import the data from different sources like HDFS/HBase into SparkRDD and spark to different scores.
Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
Utilized kafka for messaging and subscribing to topic, where the producer produce a topic and consumer consumes the data via subscription.
Maintain various databases for Production, and in Installation, Configuration and up gradation of SQL Server 2008r2/2012/2014/2016 technology, service packs and hot fixes for MS SQL Server 2008/2012.
Responsible and maintenance of different level SQL Server High availability solutions with SQL Server Failover Clustering, Replication, Database Snapshot, Log shipping, and Database Mirroring.
Responsible for Capacity planning, immediate performance solution, Performance Tuning, Troubleshooting, Disaster Recovery, backup and restore procedures.
Solid hand on performance monitoring with Activity monitor, SQL Profiler, Performance monitor, Database Tuning Advisor, DMVs and SQL Diagnostic Manager.
Responsible for implementing different types of Replication Models such as Transactional, Snapshot, Merge and Peer to Peer.
Extract, Transform, and Load data (ETL) from Big data sources to SQL Server using SQL Server Integration Services (SSIS) Packages on BID and SQL Data Tools.
Performance tuning of Queries and Stored Procedures using graphical execution plan, Query analyzer, monitor performance using SQL Server Profiler and Database Engine Tuning advisor.
Conducted root cause analysis of application availability and narrow down the issues related to coding practices, Database Bottlenecks, or Network Latency.
Create the MS SSIS packages for executing the required tasks. Created the Jobs and scheduled for daily running.
Experience in table/indexing, partitioning and full text search and tuning the Production server to get the performance improvement.
Maintaining both DEV/QA/Test and PRODUCTION Servers in sync. Installed and reviewed SQL server patches as well as service packs.

Environment: Hadoop, HDFS, MR, Hive, Pig, Spark, Sqoop, HBase, Java, Scala, Shell Scripting, Linux Red Hat. Multi node Hadoop cluster, SQL Cluster, SQL Server 2012/2014(SQL Server Management Studio.

Confidential, Warren, New Jersey

Hadoop Developer/SQL DBA

Responsibilities;

Configure and working with multi nodes Hadoop cluster, Installed Cloudera, Apache Hadoop, Hive, Pig and commissioning & decommissioning of data node, name node recovery, capacity planning, and slots configuration.
Developed PIG scripts to transform the raw data into intelligent data as specified by business users.
Worked closely with the data modellers to model the new incoming data sets.
Involved in start to end process of Hadoop jobs that used various technologies such as Sqoop, PIG, Hive, Map Reduce and Shell scripts (for scheduling of few jobs.
Expertise in designing and deployment of Hadoop cluster and different Big Data analytic tools including Pig, Hive, Oozie, Zookeeper, SQOOP, flume, Spark, Impala, Cassandra with Horton work Distribution.
Maintenance Hadoop, Map Reduce processes, HDFS, AWS platform and developed multiple Map Reduce jobs in PIG and Hive for data cleaning and pre-processing.
Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
Import the data from different sources like RDBMS/Hbase into Hadoop.
Transformed the data by applying ETL processes using Hive with large sets of structured, semi structured and unstructured data.
Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports using SQL BID or Data Tools.
Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
Experience in Oozie and workflow scheduler to manage Hadoop jobs by Direct Acyclic Graph (DAG) of actions with control flows.
Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs
Create and maintain various databases for Production, Development and Testing Servers using MS SQL Server2005, 2008 and 2012 with always on high availability groups.
Extract, Transform, and Load data (ETL) from heterogeneous data sources to SQL Server using SQL Server Integration Services (SSIS) Packages.
Responsible for Database and Log Backup & Restoration Process, Backup Strategies and Scheduling Backups.
Performance tuning of Queries and Stored Procedures using graphical execution plan, Query analyzer, monitor performance using SQL Server Profiler and Database Engine Tuning Advisor.
Implemented transactional replication between Primary Server and Read Only servers.
Experience in configuration of Report Server, Report Manager, Report scheduling, give permissions to different level of users in SSRS 2008/2008R2/2005.
Responsible for implementing and maintaining efficient Disaster Recovery Strategies and High availability like Replication (Snapshot and Transactional with updatable subscriptions), Log shipping, Clustering with VMware (Active-Active and Active Passive).
Conducted root cause analysis of application availability and narrow down the issues related to coding practices, Database Bottlenecks, or Network Latency.
Created the MS SSIS packages for executing the required tasks. Created the Jobs and scheduled for daily running.
Design and configure Database, tables, indexes, store procedures, functions and triggers
Troubleshot database status, performance, Replication, Log-shipping as well as various errors on production, development and UAT servers.
Experienced in Point-in-time restoring databases in production and development servers.
Migrate DTS and SSIS to import and export the data from SQL Servers.
Monitoring SQL servers for performance ad-hoc and pro-actively.
Design and implement T-SQL Scripts and Stored Procedures and tune up them.
Providing 24/7 supports to SQL application developers in implementing the applications on production server.

Environment: Apache Hadoop, HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, HBase, Oozie, Java, Scala, Kafka, Linux, SQL Server 2012(SQL Server Management Studio, Query Analyzer, SQL Profiler, Index Tuning Wizard, SSIS, MSSQL Server 2012

Confidential, Marietta, Ohio

Job Title: SQL DBA

Responsibilities:

Create and maintain various databases for Production, Development and Testing Servers using MS SQL Server 2005. Install and configure SQL Server 2005/ 2008 on the server and client machines.
Extract, Transform, and Load data from heterogeneous data sources to SQL Server using SQL Server Integration Services (SSIS) Packages.
Experience of DBCC Utilities to maintain the consistency and integrity of each database in the production server.
Conducted root cause analysis of application availability and narrow down the issues related to coding practices, Database Bottlenecks, or network latency.
Resolve locking and blocking issues.
Experience with Database consistency checks using Store Procedure, DBCC & (Dynamic Management Views) DMV’s.
Experience in table/index partitioning and full text search.
Tuning the Production server to get the performance improvement.
Running Index tuning wizard to identify missing indexes.
Use of Microsoft Diagnostic utilities to take memory dumps and automated traces.
Implementing and Monitoring SQL Server jobs using custom scripts.
Monitoring SQL Server LOGS, Application logs and analyzing the logs.
Connecting SQL Server remotely using Terminal services.
Creating Backup strategies and scheduling using third party tool like Lite Speed, as a part of database maintenance plans.
Created the MS SSIS packages for executing the required tasks. Created the Jobs and scheduled for daily running.
Involved in Performance Tuning for source databases, target databases.

Environment: SQL Server (SQL Server Management Studio, Query Analyzer, SQL Profiler, Index Tuning Wizard, SSIS, Visual Studio.

Confidential, New York, New York

Network Admin/SQL DBA

Responsibilities:

Install and configure workstation and windows server 2003.
Installation, Configuration and up gradation of SQL Server 2000/2005/2008 technology, service packs and hot fixes using MS SQL Server 2008.
MS SQL Server 2005, Planning the location of data and Transaction log files on the disk.
Responsible and knowledge of high availability of SQL Server solutions with SQL Server Failover Clustering, Database Snapshot, Log shipping, and Database Mirroring.
Responsible for Capacity planning, Performance Tuning, Disaster Recovery trouble shooting, backup and restore procedures.
Solid hands on performance monitoring with Activity monitor, SQL Profiler, Performance monitor, and maintain various databases for Production, Development and testing servers, using tuning advisor, SP commands, DMVs, and DMFs.
Responsible for implementing different types of Replication Models such as Transactional, Snapshot, Merge and Peer to Peer.
Implementing DBCC Utilities to maintain the consistency and integrity of each database in the production server.
Conducted root cause analysis of application availability and narrow down the issues related to coding practices, Database Bottlenecks, or network latency.
Resolve all locking, blocking and deadlocking issues.
Suggested Disk Capacity, Processors and Memory based on the Capacity planning.
Experience about Desktop/Laptop’s Network connectivity and support to end users.
All System configuration, update patches, drivers, setup user and administrative accounts, assign permissions and security.
Deployment and manage Exchange Server including server backup, email backup and support end users.
Create and manage Server 2003 security, such as monitoring system performance, Event Viewer log, Firewall, Virus protection, Windows update, Spyware Protection.

Environments: SQL Server (SQL Server Management Studio, Query Analyzer, SQL Profiler, Index Tuning Wizard, SSRS, SSIS, Visual Studio.

We provide IT Staff Augmentation Services!

Hadoop Developer\admin Resume

Plano, TexaS

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship