Big Data/hadoop Developer Resume
NC
SUMMARY
- Over 9+ years of experience in IT industry in Software development and Integration of various applications including experience with Hadoop, HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Hbase).
- Hands on experience in installing, configuring, and using Hadoop ecosystem components like Hadoop
- MapReduce, HDFS, Hbase, Oozie, Hive, Sqoop, Pig, Zookeeper and Flume.
- Good Understanding of the Hadoop Distributed File System and Eco System (MapReduce, Pig, Hive, Sqoop and Hbase).
- Technical expertise in Big data/Hadoop HDFS, Map Reduce, Spark, HIVE, PIG, Sqoop, Flume, Oozie, NoSQL data bases HBase, SQL, Unix Scripting.
- Experience with leveraging Hadoop ecosystem components including Pig and Hive for data analysis, Sqoop for data migration, Oozie for scheduling and HBase as a NoSQL data store.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
- Experience in installing Hadoop cluster using different distributions of Apache Hadoop, Cloudera,Hortonworks.
- Around 4 years of working experience in setting, configuring and monitoring of Hadoop cluster of Cloudera, Hortonworks distribution.
- Monitor Hadoop cluster using tools like Nagios, Ganglia, Ambari and Cloudera Manager.
- Experience in Hadoop Shell commands, writing MapReduce Programs, verifying managing and reviewing Hadoop Log files.
- Installation, Configuration, and Administration of Hadoop cluster of major Hadoop distributions such as Cloudera Enterprise (CDH3 and CDH4) and Hortonworks Data Platform (HDP1 and HDP2).
- Proficient in configuring Zookeeper, Flume to the existing Hadoop cluster.
- Strong Knowledge on Spark concepts like RDD Operations, Caching and Persistence.
- Experience in designing and implementing of secure Hadoop cluster using Kerberos.
- In depth knowledge of Job Tracker, Task Tracker, Name Node, Data Nodes and MapReduce concepts.
- Created Hive tables to store data into HDFS.
- Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
- Used “Generic Data Quality” template to create and test 67 “Data Quality” profiles, the sole purpose of which is to ensure the correctness of data quality throughout an enterprise wide application (Enterprise Data-warehouse).
- Proficient with various Ab Initio Data Cleansing, Parallelism, Transformation and Multi File System techniques.
- Used Continuous Flow graphs to capture real-time business change in the data-warehousing that has happened in transactional source.
- Constructed Continuous Flow graphs using Continuous components (PUBLISH, MULTIPUBLISH,SUBSCRIBE) under "Message Queuing" technology.
- Possess excellent working knowledge of ACE framework, by which one can create “pset” values for multiple generic graphs in one instance.
- Expert in writing UNIX Shell Scripts including Korn Shell Scripts, Bourne Shell Scripts.
- Extensively wrote Wrapper scripts for job scheduling by using Korn shell.
- Practical experience with working on multiple environments like production, development, testing.
- Expertise knowledge in Dimensional Data Models using Star schema and Snow-Flake schema used in relational, dimensional and multidimensional modeling, creation of Fact and Dimension Tables, OLAP, OLTP and thorough understanding of Data-Warehousing concepts.
TECHNICAL SKILLS:
Hadoop/Big Data Technologies: HDFS, Map Reduce, Hbase, Pig, Hive, Sqoop, Flume, CassandraOozie, Zookeeper, YARN, Kerberos, Ambari.:
Programming Languages: SQL, JAVA, Python.
Database: Oracle 11G, DB2, SQL Server, Teradata 14.
Operating Systems: Windows XP/Vista/7, UNIX (HP-UX, AIX, Sun Solaris), MS-DOS,Linux
Data Warehouse Tools: Ab Initio (Co-Op - 3.0, GDE 3.15), DQE (3.2.5), ContinuousFlow.
Scripting: UNIX, Korn shell, JavaScript.Scheduling Utilities Control-M, CA7, TWS
PROFESSIONAL EXPERIENCE:
Confidential, NC
Big Data/Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Zookeeper, Cassandra and Sqoop.
- Need to load the streaming data into HDFS using Flume.
- Have to load data from HDFS into Hbase to generate reports, which are used for Recommendation Engines, Ad-Targeting and ad-retargeting.
- Involved in Hadoop cluster maintenance, monitoring jobs and trouble shooting.
- Debugging issues and jobs in Hadoop cluster.
- Configuration and maintenance of Hadoop cluster for small environments having 22 nodes.
- Involved in loading data to HDFS and Hive using Sqoop.
- Installed Oozie workflow engine to run multiple Hive and Pig Jobs.
- Responsible for building scalable distributed data solutions using Hadoop.
- Experience in collecting metrics for Hadoop clusters using Ganglia and Ambari.
- Handled importing of data from various data sources, performed transformations using Hive, Map-Reduce, and loaded data into HDFS.
- Loaded and transformed large sets of structured, semi structured and unstructured data.
- Installed and configured Pig and Hive Shell.
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and
- troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment: Hive, Pig, HBase, Zookeeper, Hortonworks HDP 2.3 Python, Spark, shell scripts, Flume, Sqoop, Oracle, and HDFS.
Confidential, NC
Big Data/Hadoop Developer
Responsibilities:
- Monitored health of all the Processes related to Name Node HA, HDFS, Yarn, Pig, Hive, and SPARK using Cloudera Manager.
- Monitored disk, Memory, Heap, CPU utilization on all Master and Slave machines using Cloudera Manager and took necessary measures to keep the cluster up and running on 24/7 basis.
- Monitored all MapReduce Write Jobs running on the cluster using Cloudera Manager and ensured that they were able to write the data to HDFS without any issues and Data getting evenly distributed over the cluster.
- Monitored all MapReduce Read Jobs running on the cluster using Cloudera Manager and ensured that they were able to read the data to HDFS without any issues.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Involved in adding new node to a cluster and decommissioning of the effective nodes from the cluster.
- Provided Statistics of all successfully completed jobs in detail report format.
- Provided Statistics of all failed jobs in detail report format and worked on finding the root cause and resolution to Jobs failure due to disc errors, node issues etc.
- Viewed the performance of the Map and Reduced task that make up the job using Cloudera Manager.
- Involved in Analyzing system failures, identifying root causes, and recommended course of actions.
- Fine Tuned Job Tracker by changing few properties mapred-site.xml.
- Fine Tuned Hadoop cluster by setting proper number of map and reduced slots for the Task Trackers.
- Migrated Hadoop Cluster from CDH 3.X.X to CDH 4.X.X
- Integrated Kerberos into Hadoop to make cluster more strong and secure from unauthorized users.
- Configured user authentication for accessing web UI.
- Involved in Installing Cloudera Manager, Hadoop, Zookeeper, HBASE, HIVE, PIG etc.
- Experience working on processing unstructured data using Pig and Hive.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to analyze HDFS data.
- Wrote pig jobs to transform data and dump into Hbase.
Environment: Apache Hadoop, Java, MySQL, Windows, UNIX, Sqoop, Hive, Oozie.
Confidential, IL
Big Data/Hadoop Developer
Responsibilities:
- Installed, implemented and administered Hadoop & Hive setup in Ubuntu.
- Developed HiveQL scripts and Hadoop MapReduce scripts to load OLTP log data into Hive database while transforming them as per requirements and created data analytics reports.
- Formulated Hive scripts to store data in both text and ORC format for better performance.
- Debugged Hadoop & Hive server errors and issues.
- Understanding the pattern of server failures.
- Participated in coding of application programs in core-java and other scripting languages.
- Used several Hadoop and Hive libraries in development.
- Worked on performance tuning of Hive queries with partitioning and bucketing process.
- Analyzed and troubleshot Hadoop logs.
- Worked on different features of Hive - external tables, views, partitioning, bucketing.
- Design Ab initio graphs whose core functionality is to extract data from source systems, cleanse the extracted data and transform them according to the business logic before loading them into target Netezza database.
- Develop complicated graphs with multiple Ab Initio components such as Join, Rollup, Lookup, Gather, Merge, Interleave, Dedup sorted.
Environment: Hadoop MapReduce, HiveQL, Pig Latin, ETL, Flume, Administration, Shell Script, Ab Initio (Co-Op - 2.0, GDE - 3.0), Ab Initio Data Profiler, UNIX, SQL.
Confidential, IL
Big Data Engineer Internship
Responsibilities:
- Worked on analyzing Hadoop stack and different big data analytic tools including Pig, Hive, HBase database and Sqoop.
- Skilled experience in installing, configuring and using Apache Hadoop ecosystems such as MapReduce, Hive, Pig, Sqoop, Flume, Yarn, Spark, Kafka and Oozie.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
- Importing of data from various sources, performing transformations using Pig and loaded data into HDFS and extracted data from Teradata to HDFS using Sqoop.
- Used different file formats like Text files, Sequence Files, Avro.
- Used UDF's to implement business logic in Hadoop.
- Utilized Agile Scrum Methodology to help manage and organize a team of developers with regular code review sessions.
Environment: Hadoop, MapReduce, HDFS, Hive, Pig, Sqoop, Oozie, Linux, XML, MySQL, HBase, Ab Initio (Co-Op - 2.0, GDE - 3.0).