We provide IT Staff Augmentation Services!

Hive / Hadoop Infra Engineer Resume

3.00/5 (Submit Your Rating)

Danville, PA

SUMMARY

  • Over 12 years of experience in Analysis, Design, Development, Testing and Upgradation of Application which includes hands - on experience in Big data, Hive and Java in implementing complete Hadoop Solutions
  • Deploy and Maintain Hadoop stack over 2500 nodes
  • Implemented Datawarehouse solution using Hive and Spark and enabled Hive ACID transactions for CDC
  • Experience and Expertise in Devops tools like Jenkins, Puppet, GIT and Gerrit
  • CCDH-410 (Cloudera Certified Developer for Apache Hadoop) certified.
  • Strong experience and expertise in working with ETL tool MetaSuite, Datastage,DB2, SQL-Server, Oracle-Server, Netezza, Sybase, Java, Python
  • Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses
  • Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
  • Hands on Experience in installing, configuring and using Apache Hadoop ecosystem components like HDFS, Hadoop MapReduce, Zoo Keeper, Oozie, Hive, Sqoop and Pig
  • Set up Ranger Policy to streamline Hadoop access for different user group over the cluster
  • Hands on experience in Cloudera/Hortonworks Hadoop distribution (HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, HBase,Flume, Kafka)
  • Experience in ETL Migration process, involved in a EDW OFFLOAD project
  • Expertise in Ingesting Data from different RDBMS systems into Hadoop Platform using SQOOP
  • Experience with Spark-Scala/Python platform
  • In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, Job tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
  • Have good Understanding of NoSQL DBMS like HBASE
  • Set up the Config and worked in Flume to ingest Datas from spooling Directory into HDFS
  • Experience working on High Availability and High Traffic applications.
  • Expertise in bug fixing and Problem solving.
  • Ability to move the data in and out of Hadoop from various RDBMS, UNIX and Mainframe system using Sqoop and other traditional data movement technologies
  • Able to access business rules, collaborate with upstream source holders and perform source to target data mapping, design and review.
  • Experience in writing Pig and Hive scripts and extending Hive and Pig core functionality by writing custom UDFs.
  • Experience with Oozie workflow engine and Autosys in running workflow jobs with actions that run Hadoop MapReduce, Pig and Hive Scripts.

TECHNICAL SKILLS

  • Linux, MVS(ZOS), CentOS, UNIX
  • Apache Hadoop, HDFS, MapReduce, Hive, HBase, Hive QL, Spark, Sqoop, Flume, Pig, Oozie, ZooKeeper, Ranger, Ambari,Kafka
  • Jenkins, Puppet, GIT, Gerrit
  • DB2, HBase,Cassendra,SQL-SERVER, Oracle Server, Netezza, Sybase, MySql
  • Metasuite, Datastage
  • Oozie, TWS, JobTrac,CA-7, Autosys, Nagios, OpenTSDB,Dr Elephant
  • ChangeMan, ClearCase, DCCS, PVCS Version Manager, GIT, GERRIT
  • Peregrine, QualityCenter (ALM), Remedy, JIRA
  • PEGA-PRPC 6.2 /6.3
  • Core Java, Python

PROFESSIONAL EXPERIENCE

Confidential - Danville, PA

Hive / Hadoop Infra Engineer

Responsibilities:

  • Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation
  • To Interact with the Business Analyst to come up with the Business Logic to implement in Datalake Platform
  • Design and Develop the solution, Allocating task to Developers, Coordinating all the task and Delivering the Components in time for Implementation
  • Developed and Deployed DBTOOLKIT, a platform to collect all the Hadoop metrices and Hive meta store details using Pandas libraries in Python
  • Design, Develop and Maintain Geisinger Health Plan in the UDA platform, replicating the ODS Datamodel and create the Datamodel to suit the datamodel requirement
  • Automate the Workflow process for the Datamodel to get the Hive job dependencies and automate the execution of the Job using Python
  • Develop and maintain application to decommission source systems and operate using Hadoop technologies like Hadoop MR, Hive,Spark, Spark SQL, HBase, Elastic Search using SQL,Python and Shell scripting
  • Setup an ingestion pipeline to ingest data into Hadoop platform from various sources like SQL server, Teradata using SQOOP
  • Build an automated process to support the migration of reconciled dataset in to the existing pipeline
  • Create Hive UDF for making data analytics easier for the end users
  • Developed the migration process using Map Reduce focusing on extracting, parsing, validating, type checking, de-duplication, and analyzing large datasets, which led process optimization of ETL workflows
  • Optimized Hive query performance by implementing hive optimization techniques like partitioning, bucketing, vectorization, file formats, and compression
  • Improved the overall performance of ETL process by managing and reviewing Hadoop log files and evaluate applications to support existing process in production

Environment: HortonWorks 3.0.1, MapReduce Framework, Spark, SQL, Hive, HBase,Jenkins, IntelliJ, GitHub

Confidential - San Jose June

Hive / Hadoop Infra Engineer

Responsibilities:

  • Build fault-tolerant, scalable batch and real time distributed data processing system using Hive, Mapreduce, TEZ, HBase, Java/Python, Spark, etc
  • Debugging and trouble shooting the Hive queries against Big Data sets
  • Deployed the Tunnel set up to connect the Mongo DB to Hadoop in different zone
  • Implemented Datawarehouse solution using Hive and Spark and enabled Hive ACID transactions for CDC Worked in Hive ACID tables
  • Automate the Metrics Gathering using Python and Automate the Execution process using Shell script and crontab

Confidential, CA

Hive Infrastructure Engineer

Responsibilities:

  • Build fault-tolerant, scalable batch and real-time distributed data processing systems using Hive, Map-Reduce, Tez, HBase, Java/Python, Kafka, Spark, etc
  • Maintain and support existing Apache Hive platform and evolve to newer tech stacks and architectures
  • Debugging, troubleshooting and optimizing Hive queries against big data sets.
  • Provides operational excellence through root cause analysis and continuous improvement
  • Experience in designing and optimizing data queries against data in Hadoop environment using tools such as Hive explain, Dr- Elephant
  • Configuration/Change Management through Puppet
  • Worked along with the Data Infrastructure Team, to set up the Kafka to ingest RTBIDS data into HDFS
  • Deploy Hadoop stack like Hive, Spark, TEZ through Jenkins

Confidential, Berkely Heights, NJ

Hadoop/BigData WS Lead/ Hadoop/BigData Senior Developer

Responsibilities:

  • Curated Data may consist of Flat De-normalized tables or Star schema
  • NGE submission data will be augmented with all attributes needed for Linking to the CDH conformed Dimension in Netezza
  • Data in curated zone will be transformed by business rule and data rules
  • Data is provisioned to Netezza through Sqoop export and Pull through Datastage jobs

We'd love your feedback!