We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

South Portland, ME

SUMMARY:

  • Over 7 years of IT experience with strong emphasis on Design, Development and Implementation of Bigdata Hadoop Data warehouse/Business Intelligence solutions using Hadoop, HFDS, MapReduce, Hadoop Ecosystem, Development experience using Java, J2EE, JSP and Servlets.
  • Excellent understanding of Hadoop architecture and various components such as HDFS, YARN, High Availability, and MapReduce programming paradigm.
  • Expertise in setting up processes for Hadoop based application design and implementation.
  • Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
  • Experience in managing and reviewing Hadoop log files.
  • Experienced in processing Big data on the Apache Hadoop framework using MapReduce programs.
  • In depth knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
  • Experienced in creating Map Reduce jobs in Java as per the business requirements.
  • Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.
  • Experience in developing pipelines and processing data from various sources and processing them with Hive
  • Involved in converting Hive queries into Spark SQL transformations using Spark RDDs and Scala.
  • Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL.
  • Extracted data from log files and push into HDFS using Flume.
  • Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
  • Knowledge of Publish-subscribe messaging system Kafka.
  • Used Kafka for message brokering, streaming and log aggregation to put physical logs into centralized locations.
  • Extracted the data from MySQL, Oracle, SQL Server using Sqoop and loaded data.
  • Extensive knowledge in using SQL Queries for backend database analysis.
  • Strong understanding of NoSQL databases like HBase, MongoDB.
  • Proficient in deploying applications on J2EE Application servers like WebSphere, WebLogic, Glassfish, JBoss and Apache Tomcat web server
  • Worked extensively on Web services and the Service-Oriented Architecture (SOA), Simple Object Access Protocol (SOAP).
  • Motivated self-starter with Excellent Communication, Presentation and Problem-solving skills and committed to learning new technologies.
  • Committed to professionalism, highly organized, ability to work under strict deadline schedules with attention to details, possess excellent written and communication skills.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, HBase, Sqoop, Oozie, Spark, Scala, Kafka, Zookeeper, Impala, Cassandra, Mongo DB

Programming languages: C, C++, Java, Linux shell script, Python

Database: NoSQL, Oracle, DB2, MySQL, SQL Server, MS Access, HBase

Operating Systems: Windows, UNIX, LINUX.

Web Technologies: HTML, CSS, JavaScript, Servlets, XML.

IDE Tools: Eclipse, NetBeans

Web Technologies: J2EE, Servlets, JSP, Struts, Hibernate, EJB, XML, MVC, Struts, Spring.

Development Approach: Agile, Waterfall

Version Control: CVS, SVN, Git

Reporting Tools: Jaspersoft iReport, Tableau, QlikView

WORK EXPERIENCE:

Hadoop/Big Data Developer

Confidential, South Portland, ME

Responsibilities:

  • Responsible for building scalable distributed data solutions using Hadoop.
  • Understanding business needs, analyzing functional specifications and map those to develop and designing programs and algorithms.
  • Involved in loading data from RDBMS into HDFS using Sqoop.
  • Handled Delta processing or incremental updates using hive and processed the data in Hive tables.
  • Optimizing the Map Reduce code, Hive scripts for better scalability, reliability and performance.
  • Assisted with performance tuning and monitoring.
  • Developed the OOZIE workflows for the Application execution.
  • Involved in creating Hive Tables, loading with data and writing Hive queries.
  • Developed PIG Latin scripts while extracting data from source system.
  • Documented the systems processes and procedures for future references including design and code reviews, test development.
  • Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
  • Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of Hadoop cluster.
  • Extensively worked on Oozie for batch processing and scheduling workflows dynamically.
  • Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
  • Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
  • Have thorough knowledge on spark architecture and how RDD's work internally.
  • Real time streaming the data using Spark with Kafka.
  • Have exposure to Spark SQL.
  • Have experience in Scala programming language and used it extensively with Spark for data processing.
  • Developed Spark scripts by using Scala shell commands as per the requirement.
  • Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Environment: HDFS, Map Reduce, Java, Hive, Oozie, Spark, Scala, Shell Scripting, Linux, HUE, Sqoop, Flume, Kafka and Oracle.

Hadoop/Big Data Developer

Confidential, Denver, CO

Responsibilities:

  • Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
  • Used Kettle widely in order to import data from various systems/sources like MySQL into HDFS.
  • Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
  • Involved in creating Hive tables, and then applied HiveQL on those tables for data validation.
  • Monitoring the running MapReduce programs on the cluster.
  • Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS.
  • Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
  • Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables
  • Implemented the workflows using Apache Oozie framework to automate tasks.
  • Developing design documents considering all possible approaches and identifying best of them.
  • Developed scripts and automated data management from end to end and sync up b/w all the clusters.
  • Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
  • Import the data from different sources like HDFS/HBase into Spark RDD.
  • Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
  • Used various spark Transformations and Actions for cleansing the input data.
  • Developed shell scripts to generate the hive create statements from the data and load the data into the table.
  • Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
  • Involved in gathering the requirements, designing, development and testing.
  • Followed agile methodology for the entire project.
  • Prepare technical design documents, detailed design documents.

Environment: Hive, HBase, Flume, Java, Maven, Impala, Splunk, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat, Java, Scala, Python.

Hadoop Developer

Confidential, Memphis, TN

Responsibilities:

  • Performed Hadoop cluster environment administration like adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, & trouble shooting.
  • Worked on Hadoop cluster which ranged from 5-10 nodes during pre-production stage and it was sometimes extended up to 25 nodes during production.
  • Involved in the configuration of System architecture by implementing Hadoop file system in master and slave systems in Red Hat Linux Environment.
  • Developed Map Reduce programs to cleanse data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
  • Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
  • Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
  • Wrote SQL queries to process the data using Spark SQL.
  • Extracted data from different databases and to copy into HDFS file system using Sqoop.
  • Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
  • Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
  • Used Maven with SOAP Web services (JAX-WS) using XML, WSDL and Apache CXF.
  • Used Spring Integration (SI) to expose some services of our application for other applications in the company to use.
  • Used SOAP UI to test the SOAP Web services.
  • Created complex Stored Procedures, Triggers and User Defined Functions to support the front-end application.
  • Participated in trouble shooting the production issues and coordinated with the team members for the defect resolution under the tight timelines.
  • Involved in end to end implementation in the production environment validating the implemented modules.

Environment: Apache Hadoop, HIVE, PIG, HDFS, Java, UNIX, MYSQL, Eclipse, Sqoop, REST/SOAP API

Hadoop Developer

Confidential, Atlanta, GA

Responsibilities:

  • Ingested historical medical claim's data into HDFS from different data sources including databases, flat files and processed using spark, Scala, python
  • Hive external tables were used for raw data and managed tables were used for intermediate tables.
  • Developed Hive Scripts (HQL) for automating the joins for different sources.
  • Responsible for data analysis, validation, cleansing, collection and reporting using R.
  • Worked with GIT, Jira and Tomcat in Linux/Windows Environment.
  • Experienced in Shell scripting, automating using crontab.
  • Developed the Shell scripts for batch reports based on the given requirements.
  • Coding using Teradata Analytical functions, BTEQ SQL of Teradata, wrote UNIX scripts to validate, format and execute the SQLs.
  • Developed interactive dashboards, created various Ad hoc reports for users in Tableau by connecting various data sources.
  • Implemented Classification using Supervised learning like Logistic Regression, Decision trees, KNN, Naive Bayes.
  • Performed explorative data analytics and developed interactive dashboard using tableau
  • Involved in resolving defects found in testing and production support.
  • Wrote Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on Oracle database.

Environment: Hadoop, Hive, Map Reduce, HDFS, SQOOP, HBase, Pig, Oozie, Java, Bash, My-SQL, Oracle, Windows and Linux.

Java Developer

Confidential

Responsibilities:

  • Gathered specifications from the requirements.
  • Developed the application using Struts MVC 2 architecture.
  • Developed JSP custom tags and Struts tags to support custom User Interfaces.
  • Developed front-end pages using JSP, HTML and CSS
  • Developed core Java classes for utility classes, business logic, and test cases
  • Developed SQL queries using MySQL and established connectivity
  • Used Stored Procedures for performing different database operations
  • Used JDBC for interacting with Database
  • Developed servlets for processing the request
  • Used Exception Handling for handling exceptions
  • Designed sequence diagrams and use case diagrams for proper implementation
  • Used Rational Rose for design and implementation

Environment: JSP, HTML, CSS, JavaScript, MySQL, JDBC, Servlets, Exception Handling, UML, Rational Rose.

Java Developer

Confidential

Responsibilities:

  • Involved in preparation of functional definition documents and Involved in the discussions with business users, testing team to finalize the technical design documents.
  • Enhanced the Web Application using Struts.
  • Created business logic and application in Struts Framework using JSP, and Servlets.
  • Documented the code using Java doc style comments.
  • Wrote Client-side validation using Struts Validate framework and JavaScript.
  • Wrote unit test cases for different modules and resolved the test findings.
  • Implemented SOAP using Web services to communicate with other systems.
  • Wrote JSPs, Servlets and deployed them on WebLogic Application server.
  • Developed automated Build files using Maven.
  • Used Subversion for version control and log4j for logging errors.
  • Wrote Oracle PL/SQL Stored procedures, triggers.
  • Helped production support team to solve trouble reports
  • Involved in Release Management and Deployment Process.

Environment: Java, J2EE, Struts, JSP, Servlets, JavaScript, Hibernate, SOAP, WebLogic, Log4j, Maven, CVS, PL/SQL, Oracle, Windows.

Hire Now