Hadoop Developer Resume
Washington D, C
SUMMARY
- Overall 5 years of professional experience in IT industry. Including 2 years’ experience in Hadoop development with solving problems and delivering high quality results in a fast - paced environment, one-year experience in with 2 years into Core Java based programming.
- In-depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Map Reduce, Kafka, Sqoop, Hive, Pig, SparkSQL, Yarn, Hue, HCatalogs.
- Good knowledge on the Cloudera Apache Hadoop tool.
- Worked independently with Cloudera support for any issue/concerns with Hadoop cluster.
- Have hands-on-experience on messaging services like JMS, Kafka, Flume.
- Experience in NoSQL databases such as MongoDB, HBase and Elastic Search.
- Hands-on experience in developing Hive UDF's, UDAF's, and Pig MACROS, Pig UDF's.
- Extensive Experience in validating and cleansing the data using Hive Queries and Pig statements.
- Experience in writing robust/reusable Hive Queries for processing and analyzing large volumes of data.
- Read, processed and stored data parallel using the Hive Query Language.
- Have a good knowledge on systems using spark, JAVA.
- Strong analytical, problem solving and communicational skills with ability to work in a group or independently.
- Used Kafka & Spark Streaming for real-time stream processing.
- Supported Map Reduce Programs those are running on the cluster.
- Experience in extracting source data from Sequential files, XML, JSON and other file formats and transforming and loading it into the target Data warehouse using Sqoop with Bash Scripts.
- Experience in data processing like collecting, aggregating, moving from various sources using Kafka.
- Used Kafka & Spark Streaming for real-time stream processing.
- Experience in Spark for data manipulation, preparation, cleansing.
- Hands on experience with Spark Core, Spark SQL, Spark Streaming using PySpark.
- Used Spark-SQL to perform transformations and actions on data residing in Hive and MongoDB.
- Worked on kerborized Hadoop cluster with 250 nodes on Cloudera distribution 5.4.5.
- Worked on to migrate existing data to Hadoop from RDBMS (MySQL, SQL Server, and Oracle) using Sqoop.
- Worked on external tables with proper partitions for efficiency and loaded the structured data in
- Experience in managing and reviewing Hadoop Log files.
- Responsible for 250+ RHEL servers in Enterprise environment. Support hardware/software's issues in Production, install and configure software's, patch install, troubleshoot a performance issue.
- Fine tune Linux systems for better performance, modify kernel parameters to achieve optimal system performance.
- Used JMS and created MDBs, sender and receiver and test servlets to check the results of program.
- Experienced in Using monitoring tools such as top, sar, vmstat, iostat, Net stat to identify resource issue with Linux severs and provide recommendations.
- Experience in development of Java applications.
- Have a six years of work experience in Core Java developing programs.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, MapReduce, HDFS, Yarn, hue, Hcatalog.
Ingestion Tool: Scoop, Kafka.
Databases: HBase, Mongo DB, MySQL, Oracle.
Programming Languages: Java, Spark.
Scripting Languages: HiveSQL, Pig Latin, Bash Script, XML, HTML, CSS.
Web / Application Servers: Apache, Tomcat Application Server
Operating system: Linux, Red Hat, CentOS.
Virtualization: VMware, VSphere, VMware VSphere, Vcenter.
System Monitoring tools: sar, vmstat, iostat, top, tcpdump, PS
Cloud Technologies: Amazon Web Services (AWS), EC2, EMR, VPC, RDS, Auto scaling, S3, AWS Import / Export.
PROFESSIONAL EXPERIENCE
Confidential - Washington D.C.
Hadoop Developer
Responsibilities:
- Involved in choosing the right configurations for Hadoop.
- Requirement gathering from the Business Partners and Subject Matter Experts.
- Played a major role in Hadoopcluster installation, configuration and monitoring.
- Developed data pipeline using Kafka, Spark and HBase ingest, process and store data.
- Selected HBase database since the data is NoSQL.
- Wrote Kafka configuration files for importing streamed log data into HBase.
- Analyze and define researcher's strategy and determine system architecture and requirement to achieve goals.
- Developed multiple Kafka Producers, Consumers and Zookeeper to maintain the smooth flow as per the software requirement specifications.
- Wrote the java program to connect the Kafka with Spark Streaming using eclipse in Cloudera distribution
- Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HBase.
- Wrote a Scala program to ensure that data is going to the HBase.
- Used various Spark Transformations and Actions for cleansing the input data.
- Developed shell scripts to generate hive create statements from the data and load the data into the table.
- Wrote Map Reduce jobs using Java API and Pig Latin
- Optimized Hive QL/ pig scripts by using execution engine like Spark.
- Involved in writing custom Map-Reduce programs using java API for data processing.
- Involved in developing a linear regression model to predict a continuous measurement for improving the observation on data developed using spark with Scala API.
- Worked on the development in analyzation of data in spark.
- Implemented Spark Scripts using Scala, Spark SQL to access hive tables into spark for faster processing of data.
- The hive tables are created as per requirement were Internal or External tables defined with appropriate static, dynamic partitions and bucketing, intended for efficiency.
- Load and transform large sets of structured, semi structured data using hive.
- Written Spark jobs in Scala to analyze the data of the customers and sales history.
- Involved in designing the row, key in HBase to store Text, JSON, Parquet and Avronformat files to create schema for HBase tables,
- Used Spark and Spark-SQL to read the parquet data and create the tables in hive using the Scala API.
- Develop Hive queries for the analysts.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre- processing with Pig.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Involved in performing the analytics and visualization for the data from the logs and estimate the error rate and study the probability of future errors using regressing models.
- Used Kafka to patch up a customer activity taking after pipeline as a course of action of steady appropriate subscribe supports.
- Exported the analyzed data in HBase to the Oracle using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop, Cloudera, HDFS, pig, Hive, Kafka, Sqoop, Spark, Scala, HBase, MySQL, Oozie, Shell Scripting, Linux Red Hat, Java.
Confidential - Durham, NC.
Jr. Hadoop Developer
Responsibilities:
- Involved in collecting and aggregating large amounts of streaming data into HDFS using Flume and defined channel selectors to multiplex data into different sinks.
- Created a twitter application Tp flume to fetch the data from twitter. played a major role in the implementation complex map reduce programs to perform map side joins and reduce using distributed cache.
- Experienced in developing complex MapReduce programs against structured, and unstructured data.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Converted existing SQL queries into Hive QL queries.
- Had experience in loading data to hive and accessed the data from hive.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, xml and json.
- Written multiple Map Reduce programs to power data for extraction, transformation and aggregation from multiple file formats including xml, json, csv & other compressed file formats.
- Refined the Website clickstream from data from Omniture logs and moved it into Hive.
- Developed programs using the scripting languages like Pig t manipulate the data.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- PIG UDF was required to extract the information of the area from the huge data which we get from the sensors.
- Maintained the track records of the project.
- Created hive tables according to the company requirements.
- Experience in working with very large data sets.
- Build programs that leverage the parallel capabilities of Hadoop and MPP platforms
- Involved in NoSQL database Mongo DB design, integration and implementation.
- Loaded data into NoSQL database MongoDB.
Environment: Flume, Pig, Hive, and MongoDB database, Sqoop, and Cloudera Manager
Confidential
Hadoop Administrator
Responsibilities:
- Collaborated with teams in Hadoop development for Cluster Planning, Hardware requirement, Server configurations, network equipment's to implement clusters in Cloudera Distributed Hadoop.
- Involved in development of ongoing administration of Hadoop infrastructure.
- Implemented commissioning and decommissioning of data nodes, updating the metadata of the name node, killing the unresponsive task tracker and dealing with blacklisted task trackers.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from Oracle, NoSQL and various portfolios
- Created the derby database to store the log files generated by the hive.
- Resolving tickets submitted by users, troubleshoot the documented errors, resolving the errors.
- Involved in creating Hive tables and loading and analyzing data using Hive queries.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive, and Sqoop.
- Created workflow using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Assisted in importing data to HDFS and exporting analyzed data to relational databases using Sqoop.
- Developed multiple MapReduce jobs in Java for data cleaning and preprocessing.
- Automated script to monitor HDFS and HBase through cronjobs.
- Supported code/design analysis, strategy development, and project planning.
Environment: Oozie, Sqoop, pig Latin, Sqoop, HBase, Oracle.
Confidential
Java Developer
Responsibilities:
- Participation in sprint planning and collaborate with product owners to identify and prioritize product and technical requirements.
- Used various Core Java techniques like Exception Handling, Data Structuresand Collections toimplement various features and enhancements.
- Provide architectural solutions as needed across applications involved in the development.
- Co-ordinate multiple development teams to complete a feature
- Developing new projects or enhancements and maintaining the existing program to support onlineapplication.
- Periodically communicate project status to stakeholders
- Work on Design patterns and involvement in design decisions
- Used JMS to connect with the application in India to connect with the regional services in USA.
- Created a sender and receiver code by using java programming.
- Developed a Message Driven Beam in India when the customer received the courier then a message is sent to the management.
- Developed a hibernate framework to simplify the development of java application to interact with database like Oracle
Environment: Core Java, JMS, Hibernate Framework