Sr.hadoop Administrator Resume
Phoenix, AZ
SUMMARY:
- 11+years of Overall experience in Analysis, Design, Development, Testing, Implementation, Maintenance and Enhancements on various IT Projects and experience in Big Data in implementing end - to-end Hadoop solutions.
- Excellent experience in developing different components using Apache Hadoop ecosystem components like, Map Reduce, Hive, HBase, PIG, Sqoop, Spark, Kafka, Flume, Zookeeper, Oozie, and Storm.
- Expertise in depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, MRv1 andMRv2 (YARN).
- Experienced with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining, machine learning and advanced data processing.
- Expertise in writing Apache Spark streaming API on Big Data distribution in the active cluster environment.
- Experienced on implementation of a log producer in Scala that watches for application logs, transform incremental log and sends them to a Kafka and Zookeeper based log collection platform.
- Experienced in working with Flume to load the log data from multiple sources directly intoHDFS.
- Excellent knowledge in building and scheduling Big Data workflows with the help of OOZIEand Auto-sys.
- Experienced in importing and exporting data from the different Data sources like (Teradata and DB2) using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice-versa and load into partitioned Hive tables.
- Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data.
- Developed Simple to complex Map/reduce streaming jobs and wrote REST API’s usingPython languages that are implemented using Hive and Pig.
- Experienced in writing complex Map Reduce programs that work with different file formats like Text Sequence, JSON and Avro.
- Expertise in NOSQL databases like HBase, MongoDB and Cassandra.
- Experienced working with Horton works Distribution and Cloudera Distribution.
- Worked on Amazon Web Services and EC2.
- Worked on Amazon AWS concepts like EMR and EC2 which provides fast and efficient processing.
- Experience in Elastic Search (used for Faster Indexing), Kibana (Creating Dashboards), Splunk (Log Analysis and Dashboards).
- Expertise in core Java, J2EE, Multithreading, JDBC, Hibernate, Spring, Shell Scripting and proficient in using Java API’s for application development.
- Usage of Gradle to build and automate Hadoop jobs for containers.
- Implemented SOAP based web services.
- Excellent understanding/knowledge of design and implementation of Teradata data warehousing solutions, Teradata Aster big data analytics and Analytic Applications.
- Good working experience in using Spark SQL to manipulate Data Frames in Python .
- Good knowledge in NoSQL databases including Cassandra and MongoDB.
- Experience working on Cloudera, MapR and Amazon Web Services(AWS).
- Excellent understanding of how Socket Programmingenables two or more hosts to communicate with each other.
- Involvement in creating custom UDFs for Pig and Hive to consolidate strategies and usefulness ofPython/Java into PigLatin and HQL (HiveQL).
- Planned and created answer for constant information ingestion utilizing Kafka, Storm, Spark spilling and different NoSQL databases.
- Developed REST API to call USPS address validation API.
- Developed REST APIs using Scala, Play framework and Akka.
- Extensive experience in analyzing data using Hadoop Ecosystems including HDFS, Hive, PIG, Sqoop, Flume, MapReduce, Spark, Kafka, HBase, Oozie, Solr and Zookeeper
- Knowledge on implementing BigData in Amazon Elastic MapReduce (Amazon EMR) for processing, managing Hadoop framework dynamically scalable Amazon EC2 instances.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Experience with Amazon Web Services, AWS command line interface, and AWS data pipeline.
- Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as Oracle,SQL Server2014/2012 MySQL, and IBM DB2.
- Hands on experience on Database tuning and Query tuning.
- In depth understanding/knowledge of HadoopArchitecture and various components such as HDFS,JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts.
- Extensive hands on experience in writing complex Mapreduce jobs, Pig Scripts and Hive data modeling.
- Experience in converting MapReduce applications to Spark.
- Good working knowledge in cloud integration with Amazon Web Services components like EMR, EC2, S3 etc.
- Good knowledge in using job scheduling and workflow designing tools like Oozie.
- Experience in working with BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning the Hadoopcluster by gathering and analyzing the existing infrastructure.
- Have good experience creating real time data streaming solutions using Apache Spark/SparkStreamingApache Storm, Kafka and Flume.
- Experience in handling messagingservices using ApacheKafka.
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Integrated third party REST APIs into application using Akka-HTTP.
- Developed various MapReduce applications to perform ETL workloads on terabytes of data.
- Experienced in developing and implementing web applications using Java, J2EE, JSP, Servlets, JSF, HTML, DHTML, EJB, JavaScript, AJAX, JSON, JQuery, CSS, XML, JDBC and JNDI.
TECHNICAL SKILLS:
Hadoop Technologies: Apache Hadoop, Cloud era Hadoop Distribution (HDFS and Map Reduce)
Technologies: HDFS, YARN, Map Reduce, Hive, Pig, Sqoop, Flume, Spark, Kafka, Zookeeper, and Oozie, CDH 4, CDH 5 & HDP 2.4.2
Hadoop Ecosystem: Hive, Pig, Sqoop, Flume, Zookeeper, Oozie.
Streaming Technologies: Spark, Kafka, Storm.
AWS: S3, EC2.
Java/J2EE Technologies: Core Java, Data Structures, Multithreading.
NOSQL Databases: Hbase, Cassandra, MongoDB
Programming Languages: Java, Linux shell scripting, Scala, Python.
Web Technologies: HTML, CSS, JavaScript, AJAX, JSP, DOM, XML
Databases: MySQL, SQL, Oracle, SQL Server, DB2, PL/SQL.
Application Servers: Web Logic, Web Sphere, JBoss.
Software Engineering: Scrum, Agile methodologies
ETL: Talend.
Operating Systems: Windows MAC OS, UNIX, LINUX.
IDE Tools: Eclipse, IntelliJ IDEA
PROFESSIONAL EXPERIENCE:
Confidential, Phoenix, AZ
Sr.Hadoop Administrator
Responsibilities:
- Involved in Agile methodologies, daily scrum meetings, spring planning.
- Integrate visualizations into a Spark application using Databricks and popular visualization libraries (ggplot, matplotlib).
- Involved in all phases of Software Development Life Cycle (SDLC) and Worked on all activities related to the development, implementation and support for Hadoop.
- Played a key role in installation and configuration of the various Hadoop ecosystem tools such asSolr, Kafka, Pig, HBase and Cassandra.
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Wrote complex Hive queries and UDFs in Java and Python.
- Involved in implementing an HDInsight version 3.3 clusters, which is based on spark version 1.5.1.
- Responsible for data extraction and data ingestion from different data sources into Hadoop Data Lake by creating ETL pipelines using Pig, and Hive
- Job duties involved the design, development of various modules in Hadoop Big Data Platform and processing data using Map Reduce, Hive, Pig, Sqoop and Oozie.
- Design, developed and tested Map Reduce programs on Mobile Offers Redemptions and Send it to the downstream applications like HAVI.
- Extract, transform, and load (ETL) data from multiple federated data sources (JSON, relational database, etc.) with DataFrames in Spark.
- Extend the capabilities of DataFrames using User Defined Functions in Python and Scala.
- Resolve missing fields in DataFrame rows using filtering and imputation.
- Load log data into HDFS using Flume, Kafka and performing ETL integrations
- Experience in Cloudera Hadoop Upgrades and Patches and Installation of Ecosystem Products through Cloudera manager along with Cloudera Manager Upgrade
- Generated Tableau Dashboards Implementing Quick/Context filters, Parameters
- Proficient with Tableau Server, Tableau Desktop, Tableau Online.
- Analyzed the behavior of the user by working with HiveQL on logs of Big Data.
- Expert in Strong understanding of dimensional data modeling, Strong SQL optimization capabilities, Metadata Management (Connections, Data Model, VizQL Model)
- Worked with security team, troubleshooting connectivity issues within LDAP & Ranger AD, knoxgateway, ODBC/JDBC connectivity issues, kerberos accounts &keytabs.
- Working with data delivery team to setup new Hadoop users, Linux users, setting up Kerberosprincipals and testing HDFS, Hive, Pig and MapReduce access for the new users on Horton works &Cloudera Platform
- Developed data pipeline using Flume, Sqoop to ingest customer behavioral data and purchase histories into HDFS for analysis.
- Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.
- Used HiveQL for data analysis like creating tables and import the structured data to specified tables for reporting.
- Used Pig to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed into Hbase.
- Participated in development/implementation of ClouderaHadoop environment.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked with Zookeeper, Oozie, and Data Pipeline Operational Services for coordinating the cluster and scheduling workflows.
- Designed and built the Reporting Application, which uses the Spark SQL to fetch and generate reports onHBase table data.
- Created HBase column families to store various data types coming from various sources.
- Loaded data into the cluster from dynamically generated files
- Troubleshooting performance issues with ETL/SQL tuning.
- Developed and maintained the continuous integration and deployment systems using Jenkins, ANT,Akka and MAVEN.
- Effectively used GIT(version control) to collaborate with the Akka team members.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Developed HDFS with huge amounts of data using Apache Kafka.
- Collected the log data from web servers and integrated into HDFS using Flume.
Environment: Map Reduce, HDFS, Hive, Pig, Hue, Oozie, Eclipse, HBase, Flume Linux, Java gcc4.2, GIT.
Confidential, Kansas City, KS
Sr.Hadoop Developer
Responsibilities:
- Implemented discretization and binning, data wrangling, cleaning, transforming, merging andreshaping data frames using Python. files in near real time and process them within few seconds.
- Created EC2 instances and implemented large multi node Hadoop clusters in AWS cloud from scratch.
- Configured AWS IAM and Security Groups.
- Responsible for implementing Kerberos, creating service principals, user accounts, keytabs, & syncing with AD.
- Developed terraform template to deploy Cloudera Manager on AWS.
- Configured different Notifications on AWS Services.
- Installed, configured Hadoop Cluster using Puppet.
- MR2 Batch job was written to fetch required data from DB and store the same in CSV (static file)
- Spark job to process the files from Vision EMS and AMN Cache to identify the violations and sending the same to Smarts as SNMP traps.
- Automated workflows using shell scripting to schedule(crontab) Spark jobs.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Implemented best income logic using Pig scripts and UDFs.
- Component unit testing using Azure Emulator Analyze escalated incidences within the Azure SQL database.
Environment: Hadoop, HDFS, Hive, Hadoop Pig Scripts, HBASE, Sqoop, Oozie, Python, Zookeeper, Map- reduce, Java, IBM DataStage 8.5, UNICA, DB2, Teradata, UNIX, MYSQL, Shell Scripting, Control M, SQL.
Confidential, New York City, NY
Hadoop Developer
Responsibilities:
- Involved in architectural design cluster infrastructure, Resource mobilization, Risk analysis and reporting.
- Installation and configuration of Big Insight cluster with help of IBM engineers.
- Commissioning and de-commissioning the data nodes and involve in Name Node maintenance.
- Install security using Kerberos on cluster for AAA (authentication, authorization and auditing).
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment and supporting and managing Hadoop Clusters using Apache, Horton works, Clouderaand MapReduce
- Involved in loading data from UNIX file system to HDFS And Created custom Solr Query components to enable optimum search matching
- Regular backup and clear logs from HDFS space. This is to utilize data nodes optimally. Write shell scripts for time bound commands execution.
- Edit and configure HDFS and tracker parameters.
- Script the requirements using BigSQL and provide time statistics of running jobs.
- Involve code review tasks in simple to complex Map/reduce Jobs using Hive and Pig
- Cluster Monitoring using Big Insights ionosphere tool.
- Importing of data from various data sources, parse into structured data region wise and date wise. Analysed the data by performing Hive queries and running Pig scripts to study customer behaviour.
Environment: BDA 4.8, CDH 5.10.1, Hadoop, Spark, HDFS, Hadoop Pig Scripts, Scala, Sqoop, Hue, Hive, Impala, Oozie, Java, Rally, UNIX, Parquet, Snappy compression, Python, Shell Scripting, SQL.
Confidential, Charlotte, NC
Hadoop Developer
Responsibilities:
- Worked in Agile methodology and used iceScrum for Development and tracking the project.
- Worked with HQL and Criteria API from retrieving the data elements from database.
- Hand on experience on cluster up gradation and patch upgrade without any data loss and with proper backup plans
- Configured different Notifications on AWS Services.
- Responsible for Installation of various Hadoop Ecosystems and Hadoop Daemons
- Working experience on maintaining MySQL databases creation and setting up the users and maintain the backup of databases.
- Developed HDFS with huge amounts of data using Apache Kafka.
- Collected the log data from web servers and integrated into HDFS using Flume.
- Constructed System components and developed server side part using Java, EJB, and Spring Frame work. Involved in designing the data model for the system.
- Used J2EE design patterns like DAO, MODEL, Service Locator, MVC and Business Delegate.
- Worked with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integrationand Migration
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Implemented a proof of concept (Poc's) using Kafka, Strom, HBase for processing streaming data.
- Creating the automated build and deployment process for application, re-engineering setup for better user experience, and leading up to building a continuous integration system.
- Predicted consumer behavior, such as what products a particular user's has bought and made predictions/recommendations based on recognizing patterns by using Hadoop, Hive and Pig queries.
- Installed and configured Hadoop, MapReduce, and HDFS.
- Developed multiple MapReduce jobs using Java API for data cleaning and pre-processing.
- Importing and exporting data into HDFS and HIVE from an Oracle 11g database using Sqoop
- Responsible to manage data coming from different sources.
- Implemented Kerberos Security Authentication protocol for existing cluster
- Involved in transforming data from Mainframe tables to HDFS, and HBASE tables using Sqoop andPentaho Kettle and also worked on Impala to analyze stored data
- Involved in writing Map reduce programs and tested using MRUnit
- Installed and configured local Hadoop Cluster with 3 nodes and set up 4 nodes cluster on EC2 cloud
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase - Hive Integration
- Developing scripts and batch job to schedule a bundle (group of coordinators), which consists of various Hadoop programs using Oozie
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports
- Installation and Configuration of VMware vSphere client, Virtual Server creation and resource allocation
- Monitoring Hadoop Cluster through Cloudera Manager and Implementing alerts based on Error messages Providing reports to management on Cluster Usage Metrics.
Environment : ES-9000, Pentium, OS/390, Win NT, VS-COBOL II, VB, SQL 2000, TSO/ISPF,
File-Aid, DB2, MQ Series, PVCS Version Manager & Endevor.
Confidential, Houston, TX
Hadoop Developer
Responsibilities:
- Monitoring the running MapReduce programs on the cluster.
- Responsible for loading data from UNIX file systems into HDFS.
- Installed and configured Hive.
- Worked with application teams to install Hadoop updates, patches, version upgrades as required.
- Installed and configured Hive, Pig, Sqoop and Oozie on the HDP 2.0 cluster.
- Implemented Data Virtualization platforms using Denodo application in pharmaceutical (Gxplearn environment) and banking domain.
- Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
- Done Proof of Concept in Apache Nifi workflow in place of Oozie to automate the tasks of loading
- Implemented Nifi flow topologies to perform cleansing operations before moving data into HDFS.
- Installed Apache Tez, a programing framework which is built on YARN in increase performance.
- Experience on deployment of Apache Tez on top of YARN.
- Extensive Data virtualization experience in creating Base view, Business View, Derived view and Data source using Denodo 6.0.
- Experience in Migrating Business reports to Spark, Hive, Pig and Map Reduce.
- Performed a Major upgrade in production environment from HDP 1.3 to HDP 2.0.
- Worked with big data developers, designers and scientists in troubleshooting map reduce job failures and issues with Hive, Pig and Flume.
- Involved in Installation and configurations of patches and version upgrades.
- Involved in Hadoop Cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring, troubleshooting.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in HDFS maintenance and administering it through Hadoop-Java API
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
- Worked very closely with various team with in American Airlines enterprise for Denodo data consuming challenges such as proper Denodo driver installation and access issue.
- Installed and configured Pig.
- Experienced in Big Data technologies such as Hadoop, Cassandra, Presto, Spark, Flume, Storm, AWS, SQL
- Wrote Pig scripts to process unstructured data and create structure data for use with Hive.
- Developed the Sqoop scripts to order and make the interaction between Pig and MySQL Database.
- Developed scripts and automated data management from end to end and sync up b/w all the clusters.
Environment: Hadoop, HDFS, HBase, Sqoop, Hive, Map Reduce, Spark- Streaming/SQL, Scala, Kafka, Solr, Sbt, Java, Python, Ubuntu/Cent OS, MySQL, Linux, GitHub, Maven, Jenkins.
Confidential, Cedar Rapids, IA
Hadoop Administrator.
Responsibilities:
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data using Lambda Architecture.
- Experience in deploying data from various sources into HDFS and building reports using Tableau.
- Developed a data pipeline using Kafka and Strom to store data into HDFS.
- Developed REST APIs using Scala and Play framework to retrieve processed data from Cassandradatabase.
- Performed real time analysis on the incoming data.
- Re-engineered n-tiered architecture involving technologies like EJB, XML and JAVA into distributed applications.
- Explored the possibilities of using technologies like JMX for better monitoring of the system.
- Configured deployed and maintained multi-node Dev and Test Kafka Clusters.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Load the data into SparkRDD and performed in-memory data computation to generate the output response.
- Loading data into HBase using Bulk Load and Non-bulk load.
- Responsible for Cluster Maintenance, Monitoring, Managing, Commissioning and decommissioning Data nodes, Troubleshooting, and review data backups, Manage & review log files for Hortonworks.
- Adding/Installation of new components and removal of them through Cloudera.
- Monitoring workload, job performance, capacity planning using Cloudera.
- Major and Minor upgrades and patch updates.
- Creating and managing the Cron jobs.
- Installed Hadoop eco system components like Pig, Hive, Hbase and Sqoop in a CLuster.
- Experience in setting up tools like Ganglia for monitoring Hadoop cluster.
- Handling the data movement between HDFS and different web sources using Flume and Sqoop.
- Extracted files from NoSQL database like HBase through Sqoop and placed in HDFS for processing.
- Installed Oozie workflow engine to run multiple Hive and Pig jobs.
- Building and maintaining scalable data pipelines using the Hadoop ecosystem and other open source components like Hive and HBase.
- Installed and configured HA of Hue to point Hadoop Cluster in cloudera Manager.
- Have deep and thorough understanding of ETL tools and how they can be applied in a Big Data environment, supporting and managing Hadoop Clusters.
- Installed and configured MapReduce, HDFS and developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Working with applications teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Extensively worked on Informatica tool to extract data from flat files, Oracle and Teradata and to load the data into the target database.
- Responsible for developing data pipeline using HDInsight, Flume, Sqoop and Pig to extract the data from weblogs and store in HDFS.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Commissioning Data Nodes when data grew and De-commissioning of data nodes from cluster in hardware degraded.
- Set up and managing HA Name Node to avoid single point of failures in large clusters.
- Working with data delivery teams to setup new Hadoop users, Linux users, setting up Kerberos principles and testing HDFS, Hive.
- Discussions with other technical teams on regular basis regarding upgrades, process changes, any special processing and feedback.
Environment: Hue, Oozie, Eclipse, HBase, Flume, Splunkd, Linux, Java Hibernate, Java jdk, Kickstart, Puppet PDSH, chef, gcc4.2, git, Cassandra, NoSql, RedHat, CDH(4.x), Flume, Impala, MySQL, mongoDB, Nagios, Chef.