Hadoop Admin Resume
PA
SUMMARY
- 9+ years of IT experience as a Big Data Admin, Developer, Designer & quality Tester with cross platform integration experience using Hadoop, Java, J2EE and Software Functional Testing.
- Hands on experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem like Pig, Hive, Impala, Oozie, Zookeeper, Sqoop, Flume, Spark, Kafka and Hbase.
- Hands on experience using Cloudera and Horton works Hadoop Distributions.
- Hands on experience in installing and configuring Apache Hadoop ecosystems using Cloudera Manager, Apache Ambari, puppet and Chef.
- Strong understanding of various Hadoopservices, MapReduce and YARN architecture.
- Responsible for writing MapReduce programs.
- Experienced in importing - exporting data into HDFSusing SQOOP.
- Load log data into HDFS using Flume.
- Experience loadingdata to Hive partitions and creating buckets in Hive
- Logical Implementation and interaction with HBase
- Developed MapReduce jobs to automate transfer the data fromHBase.
- Writing Map Reduce programs in Hadoop, pig, Hive and Scala
- Expertise in analysis using PIG, HIVE and Mapreduce.
- Worked in Multiple Environment in installation and configuration.
- Experienced in developing UDFs for Hive using Java.
- Strong understanding of NoSQL databases like HBase, MongoDB& Cassandra.
- Familiar with handling complex data processing workflows using Oozie.
- Scheduling all hadoop/hive/sqoop/Hbase jobs using Oozie.
- Experience in SQL and Worked on databases like Oracle and IBM DB2, MySQL, MongoDB
- Ability to learn quickly in work environment, fluent in communication, productive interpersonal skills with the ability to understand and cooperate with group requirements efficiently
- Dedicated to successful project completion with the ability to work in a team or as an individual, and as a liaison between different teams
- Experience in setting cluster in Amazon EC2 & S3 including the automation of setting & extending the clusters in AWS Amazon cloud.
- Developed core modules in large cross-platform applications using JAVA, J2EE, spring, Struts, Hibernate, JAX-WS Web Services, and JMS.
- Worked on debugging tools such as Dtrace, Struss and Top. Expert in setting up SSH, SCP, SFTP connectivity between UNIX hosts.
- Good understanding of Scrum methodologies, Test Driven Development and continuous integration.
- Major strengths are familiarity with multiple software systems, ability to learn quickly new technologies, adapt to new environments, self-motivated, team player, focused adaptive and quick learner with excellent interpersonal, technical and communication skills
- Experience in defining detailed application software test plans, including organization,participant, schedule, test and application coverage scope.
- Experience in gathering and defining functional and user interface requirements for softwareapplications.
TECHNICAL SKILLS
Hadoop/Big Data: Hadoop, Map Reduce, HDFS, Zookeeper, Kafka,Hive, Pig, Sqoop, Oozie, Flume, Yarn,HBase, Spark with Scala.
No SQL Databases: Hbase, Cassandra, mongoDB
Languages: C, C++, Java, J2EE, PL/SQL, Pig Latin, HiveQL, UNIX shell scripts
Java/J2EE Technologies: Applets, Swing, JDBC, JNDI, JSON, JSTL, RMI, JMS, Java Script, JSP, Servlets, EJB,JSF, JQuery
Frameworks: MVC, Struts, Spring, Hibernate
Operating Systems: Sun Solaris, HP-UNIX,RedHat Linux, Ubuntu Linux and Windows XP/Vista/7/8
Web Technologies: HTML, DHTML, XML, AJAX, WSDL, SOAP
Web/Application servers: Apache Tomcat, WebLogic, JBoss
Databases: Oracle 9i/10g/11g, DB2, SQL Server, MySQL, Teradata
Tools: and IDE: Eclipse, NetBeans, Toad, Maven, ANT, Hudson, Sonar, JDeveloper, Assent PMD,DB Visualizer
Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP
PROFESSIONAL EXPERIENCE
Confidential, PA
Hadoop Admin
Responsibilities:
- Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, Hbasedatabase and Sqoop.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from LINUX file system to HDFS.
- Perform architecture design, data modeling, and implementation of Big Data platform and analytic applications for the consumer products
- Analyze latest Big Data Analytic technologies and their innovative applications in both business intelligence analysis and new service offerings.
- Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
- Implemented test scripts to support test driven development and continuous integration.
- Worked on tuning the performance of Mapreduce Jobs.
- Implemented Map Reduce using Hadoop,Pig,Hive and Scala
- Responsible to manage data coming from different sources.
- Load and transform large sets of structured, semi structured and unstructured data
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
- Exported the analyzed data to the relational databases using Sqoopfor visualization and to generate reports for the BI team.
- Responsible for maintaining Content Management System on daily basis.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- CreatedOozie workflows to run multiple MR, Hive and pig jobs.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig and Sqoop.
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat, Spark,Scala
Confidential
HadoopAdmin/Developer
Responsibilities:
- Installed and configured HadoopMapReduce, HDFS and developed multiple MapReduce jobs in Java for data cleansing and preprocessing.
- Extensive experience in designing and implementing Data Flow pipeline from RDBMS.
- Designing ETL Data Pipeline flow to ingest the data from RDBMS source to Hadoop using shell script, sqoop, ssis package and mysql
- Creating Sqoop Jobs to import the data and to load into hdfs
- Involved in setting up Multi Node cluster in Amazon Cloud by creating instances on Amazon EC2.
- Created MapReduce Jobs on Amazon Elastic Map Reduce (Amazon EMR).
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive UDFs.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Devised procedures that solve complex business problems with due considerations for hardware/software capacity and limitations, operating times and desired results.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Provided quick response to ad hoc internal and external client requests for data and experienced in creating ad hoc reports.
- Responsible for building scalable distributed data solutions using Hadoop.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, and loaded data into HDFS.
- Extracted the data from Teradata into HDFS using Sqoop.
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior like shopping enthusiasts, travelers, music lovers etc.
- Exported the patterns analyzed back into Teradata using Sqoop.
- Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
- Installed Oozie workflow engine to run multiple Hive.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Process Real time data with Spark
Environment: Hadoop, MapReduce, HDFS, Hive, Ooozie, Java (jdk1.6), Cloudera, NoSQL, Oracle 11g, SQL*PLUS, Toad 9.6, Windows NT, UNIX Shell Scripting, Spark.
Confidential
Hadoop Developer
Responsibilities:
- Worked on analyzing Hadoop cluster using different big data analytic tools including Pig, Hive, and MapReduce on EC2.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Implemented test scripts to support test driven development and continuous integration.
- Involved in running Hadoop jobs for processing millions of records and compression techniques.
- Developed multiple MapReduce jobs in java for data cleaning and pre-processing.
- Worked on tuning the performance of Pig queries.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and HBase using Sqoop from MYSQL.
- Experience working on processing semi-structured data using Pig and Hive.
- Supported MapReduce Programs those are running on the cluster.
- Gained experience in managing and reviewing Hadoop log and JSON files.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Designing and documenting the project use cases, writing test cases, leading offshore team, and interacting with client.
- Experience with professional software engineering practices and best practices for the full software development life cycle including coding standards, code reviews, source control management and build processes.
Environment: Hadoop, HDFS, HBase, Pig, Hive, MapReduce, Sqoop, Oozie, LINUX, S3, EC2, AWS and Big Data.
Confidential hadoop Admin
Responsibilities:
- Attend requirement meeting with Business Analysts/ Business Users
- Worked on Multi node Clustered environment and set up Cloudera in Hadoop echo-System.
- Performed basic Hadoop Administration responsibilities including software installation, configuration, software upgrades, backup and recovery, commissioning and decommissioning data nodes, cluster setup, cluster performance and monitoring on a daily basis.
- Involved in analyzing system failures, identifying the root causes and recommending actions to be taken.
- Created user accounts and set user’s access in the Hadoop cluster.
- Configuring Hadoop Ecosystem tools including Pig, Hive, Hbase, Sqoop, Kafka, Oozie, Zookeeper and Spark in Cloudera Environment.
- Performed on Capacity Planning, Performance Tuning, Cluster Monitoring as well as Troubleshooting.
- Creating Hive tables, loading data and writing hive queries which will run internally in Map Reduce way.
- Working experience on importing and exporting data into HDFS and Hive using Sqoop.
- Creating and managing the database objects such as tables, indexes and views.
- Experience on importing & exporting data using Sqoop from MySQL to Hive.
- Troubleshooting many cloud issues such as Data Node down, Network failures and data block missing.
- Implementing Kerberos to authenticate all services in Hadoop cluster and manage security.
- Managed Hadoop cluster and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce concepts.
- Hands-on perform to setup data pipeline using Kafka and Spark platform.
- Assign permission on topics to different consumers and groups, manage spark RDD, working with dataset and Dataframe, save different data as a hive table using HCatalog server.
- Manage different file format for Hive table like text, RC, ORC, Sequence, Parquet and Avro.
- Understanding of AWS cloud computing platform and related services
Environment: Hadoop, HDFS, Pig, Sqoop, HBase, Shell Scripting, Ubuntu, Linux Red Hat, Spark, Scala
Confidential
Hadoop Developer/QA
Responsibilities:
- Experience in Importing and exporting data into HDFS and Hive using Sqoop.
- Developed Pig program for loading and filtering the streaming data into HDFS using Flume.
- Experienced in handling data from different data sets, join them and pre process using Pig join operations.
- Moving Bulk amount data into HBase using Map Reduce Integration.
- Developed Map-Reduce programs to clean and aggregate the data
- Developed HBase data model on top of HDFS data to perform real time analytics using Java API.
- Developed different kind of custom filters and handled pre-defined filters on HBase data using API.
- Imported and exported data from Teradata to HDFS and vice-versa.
- Strong understanding of Hadoop eco system such as HDFS, MapReduce, HBase, Zookeeper, Pig, Hadoop streaming, Sqoop, Oozie and Hive
- Implement counters on HBasedata to count total records on different tables.
- Experienced in handling Avro data files by passing schema into HDFS using Avro tools and Map Reduce.
- Worked on custom Pig Loaders and Storage classes to work with a variety of data formats such as JSON, Compressed CSV, etc.
- We used Amazon Web Services to perform big data analytics.
- Implemented Secondary sorting to sort reducer output globally in map reduce.
- Implemented data pipeline by chaining multiple mappers by using Chained Mapper.
- Created Hive Dynamic partitions to load time series data
- Experienced in handling different types of joins in Hive like Map joins, bucker map joins, sorted bucket map joins.
- Created tables, partitions, buckets and perform analytics using Hive ad-hoc queries.
- Experienced import/export data into HDFS/Hive from relational data base and Tera data using Sqoop.
- Handling continuous streaming data comes from different sources using flume and set destination as HDFS.
- Integrated spring schedulers with Oozie client as beans to handle cron jobs.
- Experience with CDH distribution and Cloudera Manager to manage and monitor Hadoop clusters
- Actively participated in software development lifecycle (scope, design, implement, deploy, test), including design and code reviews.
- Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, RDBMS/DB, Flat files, Teradata, Mysql, CSV, Avro data files.