Hive / Hadoop Infra Engineer Resume Danville, PA - Hire IT People

SUMMARY

Over 12 years of experience in Analysis, Design, Development, Testing and Upgradation of Application which includes hands - on experience in Big data, Hive and Java in implementing complete Hadoop Solutions
Deploy and Maintain Hadoop stack over 2500 nodes
Implemented Datawarehouse solution using Hive and Spark and enabled Hive ACID transactions for CDC
Experience and Expertise in Devops tools like Jenkins, Puppet, GIT and Gerrit
CCDH-410 (Cloudera Certified Developer for Apache Hadoop) certified.
Strong experience and expertise in working with ETL tool MetaSuite, Datastage,DB2, SQL-Server, Oracle-Server, Netezza, Sybase, Java, Python
Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses
Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
Hands on Experience in installing, configuring and using Apache Hadoop ecosystem components like HDFS, Hadoop MapReduce, Zoo Keeper, Oozie, Hive, Sqoop and Pig
Set up Ranger Policy to streamline Hadoop access for different user group over the cluster
Hands on experience in Cloudera/Hortonworks Hadoop distribution (HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, HBase,Flume, Kafka)
Experience in ETL Migration process, involved in a EDW OFFLOAD project
Expertise in Ingesting Data from different RDBMS systems into Hadoop Platform using SQOOP
Experience with Spark-Scala/Python platform
In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, Job tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
Have good Understanding of NoSQL DBMS like HBASE
Set up the Config and worked in Flume to ingest Datas from spooling Directory into HDFS
Experience working on High Availability and High Traffic applications.
Expertise in bug fixing and Problem solving.
Ability to move the data in and out of Hadoop from various RDBMS, UNIX and Mainframe system using Sqoop and other traditional data movement technologies
Able to access business rules, collaborate with upstream source holders and perform source to target data mapping, design and review.
Experience in writing Pig and Hive scripts and extending Hive and Pig core functionality by writing custom UDFs.
Experience with Oozie workflow engine and Autosys in running workflow jobs with actions that run Hadoop MapReduce, Pig and Hive Scripts.

TECHNICAL SKILLS

Linux, MVS(ZOS), CentOS, UNIX
Apache Hadoop, HDFS, MapReduce, Hive, HBase, Hive QL, Spark, Sqoop, Flume, Pig, Oozie, ZooKeeper, Ranger, Ambari,Kafka
Jenkins, Puppet, GIT, Gerrit
DB2, HBase,Cassendra,SQL-SERVER, Oracle Server, Netezza, Sybase, MySql
Metasuite, Datastage
Oozie, TWS, JobTrac,CA-7, Autosys, Nagios, OpenTSDB,Dr Elephant
ChangeMan, ClearCase, DCCS, PVCS Version Manager, GIT, GERRIT
Peregrine, QualityCenter (ALM), Remedy, JIRA
PEGA-PRPC 6.2 /6.3
Core Java, Python

PROFESSIONAL EXPERIENCE

Confidential - Danville, PA

Hive / Hadoop Infra Engineer

Responsibilities:

Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation
To Interact with the Business Analyst to come up with the Business Logic to implement in Datalake Platform
Design and Develop the solution, Allocating task to Developers, Coordinating all the task and Delivering the Components in time for Implementation
Developed and Deployed DBTOOLKIT, a platform to collect all the Hadoop metrices and Hive meta store details using Pandas libraries in Python
Design, Develop and Maintain Geisinger Health Plan in the UDA platform, replicating the ODS Datamodel and create the Datamodel to suit the datamodel requirement
Automate the Workflow process for the Datamodel to get the Hive job dependencies and automate the execution of the Job using Python
Develop and maintain application to decommission source systems and operate using Hadoop technologies like Hadoop MR, Hive,Spark, Spark SQL, HBase, Elastic Search using SQL,Python and Shell scripting
Setup an ingestion pipeline to ingest data into Hadoop platform from various sources like SQL server, Teradata using SQOOP
Build an automated process to support the migration of reconciled dataset in to the existing pipeline
Create Hive UDF for making data analytics easier for the end users
Developed the migration process using Map Reduce focusing on extracting, parsing, validating, type checking, de-duplication, and analyzing large datasets, which led process optimization of ETL workflows
Optimized Hive query performance by implementing hive optimization techniques like partitioning, bucketing, vectorization, file formats, and compression
Improved the overall performance of ETL process by managing and reviewing Hadoop log files and evaluate applications to support existing process in production

Environment: HortonWorks 3.0.1, MapReduce Framework, Spark, SQL, Hive, HBase,Jenkins, IntelliJ, GitHub

Confidential - San Jose June

Hive / Hadoop Infra Engineer

Responsibilities:

Build fault-tolerant, scalable batch and real time distributed data processing system using Hive, Mapreduce, TEZ, HBase, Java/Python, Spark, etc
Debugging and trouble shooting the Hive queries against Big Data sets
Deployed the Tunnel set up to connect the Mongo DB to Hadoop in different zone
Implemented Datawarehouse solution using Hive and Spark and enabled Hive ACID transactions for CDC Worked in Hive ACID tables
Automate the Metrics Gathering using Python and Automate the Execution process using Shell script and crontab

Confidential, CA

Hive Infrastructure Engineer

Responsibilities:

Build fault-tolerant, scalable batch and real-time distributed data processing systems using Hive, Map-Reduce, Tez, HBase, Java/Python, Kafka, Spark, etc
Maintain and support existing Apache Hive platform and evolve to newer tech stacks and architectures
Debugging, troubleshooting and optimizing Hive queries against big data sets.
Provides operational excellence through root cause analysis and continuous improvement
Experience in designing and optimizing data queries against data in Hadoop environment using tools such as Hive explain, Dr- Elephant
Configuration/Change Management through Puppet
Worked along with the Data Infrastructure Team, to set up the Kafka to ingest RTBIDS data into HDFS
Deploy Hadoop stack like Hive, Spark, TEZ through Jenkins

Confidential, Berkely Heights, NJ

Hadoop/BigData WS Lead/ Hadoop/BigData Senior Developer

Responsibilities:

Curated Data may consist of Flat De-normalized tables or Star schema
NGE submission data will be augmented with all attributes needed for Linking to the CDH conformed Dimension in Netezza
Data in curated zone will be transformed by business rule and data rules
Data is provisioned to Netezza through Sqoop export and Pull through Datastage jobs

We provide IT Staff Augmentation Services!

Hive / Hadoop Infra Engineer Resume

Danville, PA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship