Hive / Hadoop Infra Engineer Resume
3.00/5 (Submit Your Rating)
Danville, PA
SUMMARY
- Over 12 years of experience in Analysis, Design, Development, Testing and Upgradation of Application which includes hands - on experience in Big data, Hive and Java in implementing complete Hadoop Solutions
- Deploy and Maintain Hadoop stack over 2500 nodes
- Implemented Datawarehouse solution using Hive and Spark and enabled Hive ACID transactions for CDC
- Experience and Expertise in Devops tools like Jenkins, Puppet, GIT and Gerrit
- CCDH-410 (Cloudera Certified Developer for Apache Hadoop) certified.
- Strong experience and expertise in working with ETL tool MetaSuite, Datastage,DB2, SQL-Server, Oracle-Server, Netezza, Sybase, Java, Python
- Expert in understanding the data and designing/Implementing the enterprise platforms like Hadoop Data lake and Huge Data warehouses
- Expertise in writing Hadoop Jobs for analyzing data using Hive and Pig.
- Hands on Experience in installing, configuring and using Apache Hadoop ecosystem components like HDFS, Hadoop MapReduce, Zoo Keeper, Oozie, Hive, Sqoop and Pig
- Set up Ranger Policy to streamline Hadoop access for different user group over the cluster
- Hands on experience in Cloudera/Hortonworks Hadoop distribution (HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, HBase,Flume, Kafka)
- Experience in ETL Migration process, involved in a EDW OFFLOAD project
- Expertise in Ingesting Data from different RDBMS systems into Hadoop Platform using SQOOP
- Experience with Spark-Scala/Python platform
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, YARN, Job tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Have good Understanding of NoSQL DBMS like HBASE
- Set up the Config and worked in Flume to ingest Datas from spooling Directory into HDFS
- Experience working on High Availability and High Traffic applications.
- Expertise in bug fixing and Problem solving.
- Ability to move the data in and out of Hadoop from various RDBMS, UNIX and Mainframe system using Sqoop and other traditional data movement technologies
- Able to access business rules, collaborate with upstream source holders and perform source to target data mapping, design and review.
- Experience in writing Pig and Hive scripts and extending Hive and Pig core functionality by writing custom UDFs.
- Experience with Oozie workflow engine and Autosys in running workflow jobs with actions that run Hadoop MapReduce, Pig and Hive Scripts.
TECHNICAL SKILLS
- Linux, MVS(ZOS), CentOS, UNIX
- Apache Hadoop, HDFS, MapReduce, Hive, HBase, Hive QL, Spark, Sqoop, Flume, Pig, Oozie, ZooKeeper, Ranger, Ambari,Kafka
- Jenkins, Puppet, GIT, Gerrit
- DB2, HBase,Cassendra,SQL-SERVER, Oracle Server, Netezza, Sybase, MySql
- Metasuite, Datastage
- Oozie, TWS, JobTrac,CA-7, Autosys, Nagios, OpenTSDB,Dr Elephant
- ChangeMan, ClearCase, DCCS, PVCS Version Manager, GIT, GERRIT
- Peregrine, QualityCenter (ALM), Remedy, JIRA
- PEGA-PRPC 6.2 /6.3
- Core Java, Python
PROFESSIONAL EXPERIENCE
Confidential - Danville, PA
Hive / Hadoop Infra Engineer
Responsibilities:
- Worked closely with the business analysts to convert the Business Requirements into Technical Requirements and prepared low and high level documentation
- To Interact with the Business Analyst to come up with the Business Logic to implement in Datalake Platform
- Design and Develop the solution, Allocating task to Developers, Coordinating all the task and Delivering the Components in time for Implementation
- Developed and Deployed DBTOOLKIT, a platform to collect all the Hadoop metrices and Hive meta store details using Pandas libraries in Python
- Design, Develop and Maintain Geisinger Health Plan in the UDA platform, replicating the ODS Datamodel and create the Datamodel to suit the datamodel requirement
- Automate the Workflow process for the Datamodel to get the Hive job dependencies and automate the execution of the Job using Python
- Develop and maintain application to decommission source systems and operate using Hadoop technologies like Hadoop MR, Hive,Spark, Spark SQL, HBase, Elastic Search using SQL,Python and Shell scripting
- Setup an ingestion pipeline to ingest data into Hadoop platform from various sources like SQL server, Teradata using SQOOP
- Build an automated process to support the migration of reconciled dataset in to the existing pipeline
- Create Hive UDF for making data analytics easier for the end users
- Developed the migration process using Map Reduce focusing on extracting, parsing, validating, type checking, de-duplication, and analyzing large datasets, which led process optimization of ETL workflows
- Optimized Hive query performance by implementing hive optimization techniques like partitioning, bucketing, vectorization, file formats, and compression
- Improved the overall performance of ETL process by managing and reviewing Hadoop log files and evaluate applications to support existing process in production
Environment: HortonWorks 3.0.1, MapReduce Framework, Spark, SQL, Hive, HBase,Jenkins, IntelliJ, GitHub
Confidential - San Jose June
Hive / Hadoop Infra Engineer
Responsibilities:
- Build fault-tolerant, scalable batch and real time distributed data processing system using Hive, Mapreduce, TEZ, HBase, Java/Python, Spark, etc
- Debugging and trouble shooting the Hive queries against Big Data sets
- Deployed the Tunnel set up to connect the Mongo DB to Hadoop in different zone
- Implemented Datawarehouse solution using Hive and Spark and enabled Hive ACID transactions for CDC Worked in Hive ACID tables
- Automate the Metrics Gathering using Python and Automate the Execution process using Shell script and crontab
Confidential, CA
Hive Infrastructure Engineer
Responsibilities:
- Build fault-tolerant, scalable batch and real-time distributed data processing systems using Hive, Map-Reduce, Tez, HBase, Java/Python, Kafka, Spark, etc
- Maintain and support existing Apache Hive platform and evolve to newer tech stacks and architectures
- Debugging, troubleshooting and optimizing Hive queries against big data sets.
- Provides operational excellence through root cause analysis and continuous improvement
- Experience in designing and optimizing data queries against data in Hadoop environment using tools such as Hive explain, Dr- Elephant
- Configuration/Change Management through Puppet
- Worked along with the Data Infrastructure Team, to set up the Kafka to ingest RTBIDS data into HDFS
- Deploy Hadoop stack like Hive, Spark, TEZ through Jenkins
Confidential, Berkely Heights, NJ
Hadoop/BigData WS Lead/ Hadoop/BigData Senior Developer
Responsibilities:
- Curated Data may consist of Flat De-normalized tables or Star schema
- NGE submission data will be augmented with all attributes needed for Linking to the CDH conformed Dimension in Netezza
- Data in curated zone will be transformed by business rule and data rules
- Data is provisioned to Netezza through Sqoop export and Pull through Datastage jobs