Sr. Hadoop Developer Resume
Little Rock, AR
SUMMARY
- 6+ years of experience in all phases of Software development cycle, expertizing in Big Data Technologies likeHadoopand Spark Ecosystem and Java/J2EE technologies.
- Expertise in Big Data technologies like Hadoop, Map Reduce, Yarn, Flume, Hive, Pig, Sqoop, H Base, Pivotal, Cloudera, Map R, Avro, Spark and Scala.
- Proficient in Hadoop Architecture such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Experience in developing MapReduce Programs using Apache Hadoop for analyzing the big data as per the requirement.
- Efficient in Hive data warehouse tool creating tables, data distributing by implementing Partitioning and Bucketing strategy, writing and optimizing the Hive QL queries.
- Experience in extending Pig and Hive functionalities with custom UDFs for analysis of data, file processing, by running Pig Latin Scripts.
- Experience in ingestion, storage, querying, processing and analysis of Big Data with hands on experience in Big Data including Apache Spark, Spark SQL and Spark Streaming.
- Experience on importing and exporting the data from RDBMS databases MySQL, Oracle and DB2 into Hadoop data lake using SQOOP jobs.
- Experience in developing Sqoop jobs in incremental mode both in append and last updated mode. Developing Sqoop merge scripts to handle incremental updates.
- Proficient in developing Web based user interfaces using HTML5, CSS3, JavaScript, jQuery, AJAX, XML, JSON, jQuery UI, Bootstrap, Angular, Node JS, Ext JS.
- Expertise on working with various databases in writing SQL queries, Stored Procedures, functions and Triggers by using PL/SQL and SQL.
- Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology, good knowledge of J2EE design patterns and Core Java design patterns.
- Working knowledge of Amazon's Elastic Cloud Compute (EC2) infrastructure for computational tasks and Simple Storage Service (S3) as Storage mechanism.
- Strong Experience in troubleshooting the operating system like Linux, RedHat, and UNIX, maintaining the cluster issues and java related bugs.
TECHNICAL SKILLS
Big Data: HDFS, MapReduce, Pig, Hive, Spark, YARN, Kafka, Flume, Sqoop, Impala, Presto, Oozie, Zookeeper, Spark.
Languages: Java, Python, Scala, C/C++, go.
Java Technologies: JSE, Servlets, JavaBeans, JSP, JDBC, JNDI, AJAX, EJB, Struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Web Design: HTML, DHTML, AJAX, JavaScript, jQuery, CSS, Angular, PHP
Build Tools: Eclipse, Jenkins, Git, Ant, Maven, IntelliJ, JUNIT and log4J.
App/Web servers: WebSphere, WebLogic, JBoss and Tomcat
RDBMS: Teradata, Oracle, MS SQL Server, MySQL
PROFESSIONAL EXPERIENCE
Confidential - Little Rock, AR
Sr. Hadoop Developer
Responsibilities:
- Worked with application teams to install operating system and perform Hadoop updates, patches and version upgrades as required.
- Developed MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and pre-processing.
- Developed Spark Streaming script which consumes topics from distributed messaging source Kafka and periodically pushes batch of data to spark for real time processing.
- Extracted real-time data using Kafka and Spark streaming and process stream of RDD created from micro batches.
- Used Spark-Streaming APIs to perform transformations and actions which gets the data from Kafka in near real-time and persists into HBase.
- Wrote Hive queries for transformations on the data to be used by downstream models.
- Optimized the Hive tables using optimization techniques like partitions and bucketing to provide better performance with Hive QL queries.
- Developed a frame work to handle loading and transform large sets of unstructured data from UNIX system to Hive tables.
- Used Spark Data Frames Operations to perform required Validations in the data and to perform analytics on the Hive data.
- Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
- Configured Spark streaming to get streaming info from Kafka & store them in HDFS.
- Developed Pig Latin scripts to load data from output files and put to HDFS.
- Used Oozie Workflow engine to run multiple Hive and Pig jobs.
- Responsible for Performance Tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and Memory tuning.
- Involved in deploying the applications in AWS and maintain the EC2 and RDS (Relational Database Services).
- Created detailed AWS Security groups which behaved as virtual firewalls that controlled the traffic allowed reaching one or more AWS EC2 instances.
- Created data-lake by extracting customer's data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
Confidential - Gaithersburg, MD
Hadoop Developer
Responsibilities:
- Monitored systems, services and implementation of Hadoop deployment, configuration management, backup, and disaster recovery systems and procedures.
- Developed highly optimized Spark applications to perform data cleansing, validation, transformation and summarization activities
- Created Sqoop jobs to handle incremental loads from RDBMS into HDFS to apply Spark Transformations and Actions.
- Worked with systems engineering team to plan and deploy new Hadoop environments and expand existing Hadoop clusters.
- Worked with Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs.
- Handled large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations.
- Migrated data from MySQL server toHadoopusing Sqoop for processing data.
- Imported required tables from RDBMS to HDFS using Sqoop and used Spark and Kafka to get real time streaming of data into HBase.
- Enhanced and optimized product Spark code to aggregate, group and run data mining tasks using the Spark framework and handled JSON data.
- Worked with ELK Stack cluster for importing logs into Logstash, sending them to Elasticsearch nodes and creating visualizations in Kibana.
- Used Apache Spark with ELK cluster for obtaining some specific visualization which require more complex data processing.
- Involved in moving log files generated from various sources to HDFS for further processing through Flume.
- Worked with importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
- Worked on partitioning the Hive table and running the scripts in parallel to reduce the run time of the scripts.
- Analyzed data by performing Hive queries & running Pig scripts to know user behavior.
- Programmed Pig scripts with complex joins like replicated and skewed to achieve better performance.
Confidential - San Ramon, CA
Hadoop Developer
Responsibilities:
- Developed custom data Ingestion adapters to extract the log data and click stream data from external systems and load into HDFS.
- Used Spark as ETL tool to do complex Transformations, De-Normalization, Enrichment and some pre-aggregations.
- Worked on migrating MapReduce programs into Spark transformations using Scala.
- Scripted complex HiveQL queries on Hive tables to analyze large datasets and wrote complex Hive UDFs to work with sequence files.
- Scheduled workflows using Oozie to automate multiple Hive and Pig jobs, which run independently with time and data availability.
- Created components like Hive UDFs for missing functionality in Hive to analyze and process large volumes of data extracted from No-SQL database - MongoDB.
- Used Impala to read, write and query the Hadoop data in HDFS from MongoDB and configured Kafka to read and write messages from external programs.
- Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
- Collecting and aggregating substantial amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Worked on optimization of MapReduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for an HDFS cluster.
- Created and maintained Technical documentation for launching Cloudera Hadoop Clusters and for executing Hive queries and Pig Scripts.
- Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Implemented MapReduce programs to handle unstructured data like XML, JSON, and sequence files for log files.
Confidential - Omaha, NE
Java Developer
Responsibilities:
- Involved in Requirement Gathering, Design and Deployment of the application using Scrum (Agile) as Development methodology.
- Participated in developing use cases, use case diagrams, class diagrams, sequence diagrams and high-level activity diagrams using UML from the requirements.
- Involved in the implementation of DAO using Spring-Hibernate ORM and creating the Objects and mapped using Hibernate Annotations.
- Implemented the caching mechanism in Hibernate to load data from Oracle database.
- Bottle micro-framework implemented with REST API and Oracle as back end database.
- Used CXF API & JAX-RS technologies to develop REST & SOAP based Web services.
- Used SOAP UI to test both REST as well as SOAP based web services.
- Used XML based web services tool to push pending orders in Integration Manager.
- Used JavaScript and jQuery for validating the input given to the user interface.
- Used Ajax to update part of webpage which improved performance of the application.
- Developed layout of Web Pages using Tiles and CSS.
- Used SQL developer database tool to build, edit, and format database queries, as well as eliminate performance issues in the code.
- Used Oracle database for tables creation and involved in writing SQL queries using Joins and Stored Procedures.
- Performed unit testing using J-Unit framework and used Struts Test Cases for testing Action Classes.
- Used ANT scripts to build application and deployed on WebSphere Application Server.
- Executed the test steps defined in Test Cases manually and reporting the bugs in JIRA.