Hadoop Developer Resume
Overland Park, KS
SUMMARY:
- 8+ years of total Software development experience with Hadoop Ecosystem, Big Data and Data Science Analytical Platforms, Java/J2EE Technologies, Database Management Systems and Enterprise - level Cloud Base Computing and Applications.
- Around 3years of experience in Design and Implementation of Big data applications using Hadoop stack MapReduce, Hive, Pig, Oozie, Sqoop, Flume, HBase and NoSQL Data bases.
- Hands on experience in writing complex Map reduce jobs, Pig Scripts and Hive data modeling.
- Have experience creating batch style distributed computing applications using Apache Spark and Flume.
- Have hands-on experience doing analytics using SPARK SQL.
- Hands-on experience and in depth understanding and usage of Hadoop Architecture frameworks and various components
- Experience and in-depth understanding of analyzing data using HIVEQL, PIG.
- Worked extensively with HIVE DDLs and Hive Query language (HQLs) and Developed UDF, UDAF, UDTF functions and implemented it in HIVE Queries.
- Good hands-on experience with PIVOTAL’S query processing model HAWQ.
- In-depth understanding of NoSQL databases such as HBase.
- Proficient knowledge and hands on experience in writing shell scripts in Linux.
- Adequate knowledge and working experience in Agile & Waterfall methodologies.
- Experience in importing and exporting data using Sqoop from Relational Database Systems to HDFS and vice-versa.
- Have a good understanding of Kafka.
- Experienced in job workflow scheduling and monitoring tools like Oozie and CISCO TIDAL.
- Experience using various Hadoop Distributions (PIVOTAL, Hortonworks, MapR etc) to fully implement and leverage new Hadoop features.
- Expertise in Hadoop Ecosystem tools which including HDFS, Yarn, MapReduce, Pig, Hive, Sqoop, Flume, Kafka, Spark, Zookeeper and Oozie.
- Experienced in requirement analysis, application development, application migration and maintenance using Software Development Lifecycle (SDLC) and Java/J2EE technologies.
- Experience in development of Client/Server Technologies and Systems Software design and development using Java/ JDK, Java Beans, J2EE(TM) Technology- J2EE technologies such as Spring, Struts, Hibernate, Servlets, JSP, JBOSS, JavaScript and JDBC and web Technologies like HTML, CSS, PHP, XML.
- Experienced in backend development using SQL, stored procedures on Oracle 9i, 10g and 11i
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus
- Expertise in full life cycle development of system, requirement elicitation, making Use Cases, Class Diagram, and Sequence Diagram.
- Conscientious team player and motivated to learn and apply new concepts. Always aspires to exceed client expectations and to effectively collaborate with several cross-functional teams.
- Worked with geographically distributed and culturally diverse team, including roles that involve interaction with clients and team members.
TECHNICAL SKILLS:
Big-Data /Hadoop Technologies: MapReduce, PIG, HIVE, SQOOP, FLUME, HDFS, Kafka,Oozie, HAWQ
NO SQL Database: Hbase
Real Time/Stream processing: Apache Spark,Apache Kafka
Programming Languages: JAVA, C++, C, SQL, PL/SQL,Python, Scala.
Java Technologies: Servlets, JavaBeans, JDBC, JNDI, JTA, JPA, EJB 3.0
Framework: JUnit and JTest, LDAP
Databases: Oracle8i/9i, MY SQL, MS SQL server, POSTGRESQL
IDE's & Utilities: Eclipse, NetBeans
Web Dev. Technologies: HTML, XML
Protocols: TCP/IP, HTTP and HTTPS
Operating Systems: Linux, MacOS, Windows 8, Windows 7, Vista, XP, Windows 95/2000 and MS-DOS
PROFESSIONAL EXPERIENCE:
Confidential, Overland Park, KS
Hadoop Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop cluster environment with Hortonworks distribution.
- Worked on Kafka and REST API to collect and load the data on Hadoop file system and used sqoop to load the data from relational databases.
- Implemented Talend jobs to load data from excel sheets and integrated with Kafka.
- Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra database.
- Developed Spark scripts by writing custom RDDs in Scala and Python for data transformations and actions on RDDs.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala, Python.
- Worked with Python, to develop analytical jobs using light weight PySpark API of spark.
- Worked with Avro, ORC file formats and compression techniques like LZO.
- Used Hive to form an abstraction on top of structured data resides in HDFS and implemented Partitions, Dynamic Partitions, Bucketson HIVE tables.
- Used Spark API over Hadoop YARN as execution engine for data analytics using Hive.
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Designed, developed data integration programs in a Hadoop environment with NoSQL data store Cassandra for data access and analysis.
- Using Job management scheduler apache Oozie to execute the workflow.
- Using Ambarito monitor node's health, status of the jobs and to run the analytics jobs in Hadoop clusters.
- Worked on Tableau to build customized interactive reports, worksheets and dashboards.
- Implemented Kerberos for strong authentication to provide data security.
- Involved in performance tuning of spark jobs using Cache and by utilizing complete advantage of cluster environment.
Environment: Hadoop, HDP, Spark, Scala, Python, Kafka, Hive, Sqoop, Ambari, Mesos, Talend, Oozie, Cassandra, Tableau, Jenkins, Hortonworks, Amazon AWS and Red Hat Linux.
Confidential, KS
Hadoop Developer
Responsibilities:
- Built python script to extract the data from the Hawq tables and generated a “dat” file for the downstream application.
- Built a generic framework to parse raw data with fixed length using python which takes JSON Layout for the fixed positions of the strings and load the data into Hawq tables.
- Built generic framework that transforms two or more data sets in HDFS using python.
- Built generic frameworks for Sqoop/Hawq to load data from SQL server to HDFS and HDFS to Hawq using python.
- Performed extensive data validation using Hawq partitions for efficient data access.
- Built generic framework that allows for us to update the data in a Hawq tables using python.
- Coordinated in all testing phases and worked closely with Performance testing team to create a baseline for the new application.
- Created automated workflows that schedule jobs daily for loading data and other transformation jobs in using cisco tidal.
- Created PostgreSQL functions (stored procs) to populate the data into the tables on daily basis.
- Developed functions using PL python for various use cases.
- Wrote programs on Scala to support the play framework and act as code behind for the frontend application.
- Developed multiple Kafka topics/queues and produced 20Million data using producer
- Wrote and worked on various data types like complex Json, canonical Json and xml data to Kafkatopics.
- Developed the code for Data ingestion and acquisition using Spring XD streams to Kafka.
- The format of data is JSON data which is finally converted into avro byte format and then published to kafka.
- Documented technical design documents and production support documents.
- Documented technical design documents and production support documents.
- Worked on SSIS and SSRS tools to aid in the decommission of the data from SQL to distributed environment.
- Wrote python scripts to create automated workflows.
Environment: PHD-2.0, HAWQ 1.2, SQOOP 1.4, Python 2.6, SQL, Apache Kafka
Confidential, Philadelphia, PA
Hadoop Developer
Responsibilities:
- Pulled the data from data warehouse using Sqoopand placed in HDFS.
- Wrote MapReduce jobs to join data from multiple tables and convert it to CSV files.
- Worked with Play Frameworkto design the frontend of the application.
- Wrote programs onScala to support the play framework and act as code behind for the frontend application.
- Wrote programs in java and at times Scala to implement intermediate functionalities like events or records count from the Hbase.
- Configured multiple remote akka worker nodes and Master nodes from scratch to as per the software requirement specifications.
- Also wrote some pig scripts to do ETL transformations on the MapReduce processed data.
- Involved in review of functional and non-functional requirements.
- Responsible to manage data coming from different sources.
- Wrote shell scripts to pull the necessary fields from huge files generated by MapReduce jobs.
- Converted ORC data from hiveinto flat file using mapReduce jobs.
- Creating Hive tables and working on them using Hive QL.
- Supported the existing MapReduce Programs those are running on the cluster.
- Followed agile methodology for the entire project.
- Prepare technical design documents, detailed design documents.
Environment: Linux - Ubuntu, Hadoop pseudo distributed mode 1.2.1, HDFS, Hive, Hortonworks, Flume, Hive.
Confidential, Los Angeles, CA
Hadoop Developer
Responsibilities:
- Converting the existing relational database model to Hadoop ecosystem.
- Generate datasets and load to HADOOP Ecosystem
- Worked with Linux systems and RDBMS database on a regular basis to ingest data using Sqoop.
- Worked with Spark to create structured data from the pool of unstructured data received.
- Managed and reviewed Hadoop and HBase log files.
- Involved in review of functional and non-functional requirements.
- Responsible to manage data coming from various sources.
- Experience in implementing Kafka Java producers and create custom partitions, configured brokers and implemented High level consumers to implement data platform.
- Loaded the CDRs from relational DB using Sqoop and other sources to Hadoop cluster by using Flume.
- Involved in loading data from UNIX file system and FTP to HDFS.
- Designed and implemented HIVE queries and functions for evaluation, filtering, loading and storing of data.
- Creating Hive tables and working on them using Hive QL.
- Wrote Spark code to convert unstructured data to structured data.
- Developed Hive queries to analyze the output data.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Developed Kafka consumer's API in Scala for consuming data from Kafka topics.
- Had to do the Cluster co-ordination services through ZooKeeper.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Used HIVE to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Design and implement Spark jobs to support distributed data processing.
- Supported the existing MapReduce Programs those are running on the cluster.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Involved in Hadoop cluster task like Adding and Removing Nodes without any effect to running jobs and data.
- Followed agile methodology for the entire project.
- Installed and configured Apache Hadoop Hive and Pig environment
- Prepare technical design documents, detailed design documents.
Environment: Linux - Ubuntu, Hadoop pseudo distributed mode 1.2.1, HDFS, Hive 0.12, Flume, Kafka, Hortonworks, Spark, Flume, Hive.
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Utilized Flume to filter out the input data read to retrieve only the data needed to perform analytics by implementing flume interception.
- Used Flume to transport logs to HDFS
- Worked on Pig script to count the number of times a URL was opened in a duration. Later a comparison of the count of various other URL’s shows the relative popularity of that website among employees.
- Hive was used to pull out additional analytical information.
- Worked on Partitions, bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume
- Worked on Hue interface for querying the data.
- Involved in writing MapReduce programs for analytics
- Also used MapReduce for structuring the data coming from flume sinks.
- Managing and scheduling Jobs on a Hadoop cluster using Oozie.
- Generated the datasets and loaded to HADOOP Ecosystem.
- Performed the installation, configuration and used the Hadoop ecosystem components such as Map Reduce, HDFS, Pig, Hive, Scoop, Flume, HBase.
Environment: Hadoop, Cloudera Manager, Map Reduce, Hive, Flume, Pig.
Confidential, Houston, TX
Java Developer
Responsibilities:
- Worked with several clients with day-to-day requests and responsibilities.
- Involved in analyzing system failures, identifying root causes and recommended course of actions.
- Integrated Struts Hibernate and JBoss Application Server to provide efficient data access.
- Involved in HTML page Development using CSS and JavaScript.
- Developed the presentation layer with JSF, JSP and JAVA Script technologies.
- Designed table structure and coded scripts to create tables, indexes, views, sequence, synonyms and database triggers. Involved in writing Database procedures, Triggers, PL/SQL statements for data retrieval.
- Developed the UI components using JQuery and JavaScript Functionalities.
- Designed database and coded PL/SQL stored Procedures, triggers required for the project.
- Used Session and FacesContext of JSF Objects for passing content from one Bean to other.
- Designed and developed Session Beans to implement business logic.
- Tuned SQL statements and Web Sphere application server to improve performance, and consequently met the SLAs.
- Created the EAR and WAR files and deployed the application in different environment.
- Engaged in analyzing requirements, identifying various individual logical components, expressing the system design through UML diagrams using Rational Rose.
- Involved in running shell scripts for regression testing.
- Extensively used HTML and CSS in developing the front-end.
- Designed and Developed JSP pages to store and retrieve information.
Environment: Java, J2EE, JSP, Java Script, JSF, Spring, XML XHTML, Oracle9i, PL/SQL, SOAP Web service, Web Sphere, Oracle, JUnit, SVN.
Confidential
Graduate Trainee/Programmer Analyst
Responsibilities:
- Prepared program Specification for the development of PL/SQL procedures and functions.
- Created Custom Staging Tables to handle import data.
- Created custom triggers, stored procedures, packages and functions to populate different database.
- Developed SQL* loader scripts to load data in the custom tables.
- Run Batch files for loading database tables from flat files using SQL*loader.
- Created UNIX Shell Scripts for automating the execution process.
- Developed PL /SQL code for updating payment terms.
- Created indexes on tables and Optimizing Stored Procedure queries.
- Design, Development and testing of Reports using SQL*plus.
- Modified existing codes and developed PL/SQL packages to perform certain specialized functions/enhancement on oracle application.
- Created Indexes and partitioned the tables to improve the performance of the query.
- Involved in preparing documentation and user support documents.
- Involved in preparing test plans, unit testing, System integration testing, implementation and maintenance.
Environment: Oracle 9i/10g, PL/SQL, SQL*Loader, SQL Navigator, SQL*Plus, UNIX, Windows NT, Windows2000.
