Hadoop/Big Data Developer Resume South Portland, ME - Hire IT People

SUMMARY:

Over 7 years of IT experience with strong emphasis on Design, Development and Implementation of Bigdata Hadoop Data warehouse/Business Intelligence solutions using Hadoop, HFDS, MapReduce, Hadoop Ecosystem, Development experience using Java, J2EE, JSP and Servlets.
Excellent understanding of Hadoop architecture and various components such as HDFS, YARN, High Availability, and MapReduce programming paradigm.
Expertise in setting up processes for Hadoop based application design and implementation.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice - versa.
Experience in managing and reviewing Hadoop log files.
Experienced in processing Big data on the Apache Hadoop framework using MapReduce programs.
In depth knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRv1 and MRv2 (YARN).
Experienced in creating Map Reduce jobs in Java as per the business requirements.
Implemented Sqoop jobs for large sets of structured and semi-structured data migration between HDFS and/or other data storage like Hive or RDBMS.
Experience in developing pipelines and processing data from various sources and processing them with Hive
Involved in converting Hive queries into Spark SQL transformations using Spark RDDs and Scala.
Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL.
Extracted data from log files and push into HDFS using Flume.
Knowledge of job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Knowledge of Publish-subscribe messaging system Kafka.
Used Kafka for message brokering, streaming and log aggregation to put physical logs into centralized locations.
Extracted the data from MySQL, Oracle, SQL Server using Sqoop and loaded data.
Extensive knowledge in using SQL Queries for backend database analysis.
Strong understanding of NoSQL databases like HBase, MongoDB.
Proficient in deploying applications on J2EE Application servers like WebSphere, WebLogic, Glassfish, JBoss and Apache Tomcat web server
Worked extensively on Web services and the Service-Oriented Architecture (SOA), Simple Object Access Protocol (SOAP).
Motivated self-starter with Excellent Communication, Presentation and Problem-solving skills and committed to learning new technologies.
Committed to professionalism, highly organized, ability to work under strict deadline schedules with attention to details, possess excellent written and communication skills.

TECHNICAL SKILLS:

Hadoop/Big Data: HDFS, MapReduce, Pig, Hive, HBase, Sqoop, Oozie, Spark, Scala, Kafka, Zookeeper, Impala, Cassandra, Mongo DB

Programming languages: C, C++, Java, Linux shell script, Python

Database: NoSQL, Oracle, DB2, MySQL, SQL Server, MS Access, HBase

Operating Systems: Windows, UNIX, LINUX.

Web Technologies: HTML, CSS, JavaScript, Servlets, XML.

IDE Tools: Eclipse, NetBeans

Web Technologies: J2EE, Servlets, JSP, Struts, Hibernate, EJB, XML, MVC, Struts, Spring.

Development Approach: Agile, Waterfall

Version Control: CVS, SVN, Git

Reporting Tools: Jaspersoft iReport, Tableau, QlikView

WORK EXPERIENCE:

Hadoop/Big Data Developer

Confidential, South Portland, ME

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
Understanding business needs, analyzing functional specifications and map those to develop and designing programs and algorithms.
Involved in loading data from RDBMS into HDFS using Sqoop.
Handled Delta processing or incremental updates using hive and processed the data in Hive tables.
Optimizing the Map Reduce code, Hive scripts for better scalability, reliability and performance.
Assisted with performance tuning and monitoring.
Developed the OOZIE workflows for the Application execution.
Involved in creating Hive Tables, loading with data and writing Hive queries.
Developed PIG Latin scripts while extracting data from source system.
Documented the systems processes and procedures for future references including design and code reviews, test development.
Involved in story-driven agile development methodology and actively participated in daily scrum meetings.
Involved in all phases of the SDLC including analysis, design, development, testing, and deployment of Hadoop cluster.
Extensively worked on Oozie for batch processing and scheduling workflows dynamically.
Developed transformations and aggregated the data for large data sets using Pig and Hive scripts.
Worked on partitioning and used bucketing in HIVE tables and running the scripts in parallel to improve the performance.
Have thorough knowledge on spark architecture and how RDD's work internally.
Real time streaming the data using Spark with Kafka.
Have exposure to Spark SQL.
Have experience in Scala programming language and used it extensively with Spark for data processing.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
Optimizing of existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.

Environment: HDFS, Map Reduce, Java, Hive, Oozie, Spark, Scala, Shell Scripting, Linux, HUE, Sqoop, Flume, Kafka and Oracle.

Hadoop/Big Data Developer

Confidential, Denver, CO

Responsibilities:

Processed data into HDFS by developing solutions, analyzed the data using MapReduce, Pig, Hive and produce summary results from Hadoop to downstream systems.
Used Kettle widely in order to import data from various systems/sources like MySQL into HDFS.
Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
Involved in creating Hive tables, and then applied HiveQL on those tables for data validation.
Monitoring the running MapReduce programs on the cluster.
Installed and configured MapReduce, HIVE and the HDFS; implemented CDH3 Hadoop cluster on CentOS.
Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run MapReduce jobs in the backend.
Writing MapReduce (Hadoop) programs to convert text files into AVRO and loading into Hive (Hadoop) tables
Implemented the workflows using Apache Oozie framework to automate tasks.
Developing design documents considering all possible approaches and identifying best of them.
Developed scripts and automated data management from end to end and sync up b/w all the clusters.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop.
Import the data from different sources like HDFS/HBase into Spark RDD.
Experienced with Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Import the data from different sources like HDFS/HBase into Spark RDD.
Configured Spark streaming to get ongoing information from the Kafka and store the stream information to HDFS.
Used various spark Transformations and Actions for cleansing the input data.
Developed shell scripts to generate the hive create statements from the data and load the data into the table.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
Involved in gathering the requirements, designing, development and testing.
Followed agile methodology for the entire project.
Prepare technical design documents, detailed design documents.

Environment: Hive, HBase, Flume, Java, Maven, Impala, Splunk, Pig, Spark, Oozie, Oracle, Yarn, GitHub, Junit, Tableau, Unix, Cloudera, Flume, Sqoop, HDFS, Tomcat, Java, Scala, Python.

Hadoop Developer

Confidential, Memphis, TN

Responsibilities:

Performed Hadoop cluster environment administration like adding & removing cluster nodes, cluster capacity planning, performance tuning, cluster monitoring, & trouble shooting.
Worked on Hadoop cluster which ranged from 5-10 nodes during pre-production stage and it was sometimes extended up to 25 nodes during production.
Involved in the configuration of System architecture by implementing Hadoop file system in master and slave systems in Red Hat Linux Environment.
Developed Map Reduce programs to cleanse data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
Wrote SQL queries to process the data using Spark SQL.
Extracted data from different databases and to copy into HDFS file system using Sqoop.
Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
Used Maven with SOAP Web services (JAX-WS) using XML, WSDL and Apache CXF.
Used Spring Integration (SI) to expose some services of our application for other applications in the company to use.
Used SOAP UI to test the SOAP Web services.
Created complex Stored Procedures, Triggers and User Defined Functions to support the front-end application.
Participated in trouble shooting the production issues and coordinated with the team members for the defect resolution under the tight timelines.
Involved in end to end implementation in the production environment validating the implemented modules.

Environment: Apache Hadoop, HIVE, PIG, HDFS, Java, UNIX, MYSQL, Eclipse, Sqoop, REST/SOAP API

Hadoop Developer

Confidential, Atlanta, GA

Responsibilities:

Ingested historical medical claim's data into HDFS from different data sources including databases, flat files and processed using spark, Scala, python
Hive external tables were used for raw data and managed tables were used for intermediate tables.
Developed Hive Scripts (HQL) for automating the joins for different sources.
Responsible for data analysis, validation, cleansing, collection and reporting using R.
Worked with GIT, Jira and Tomcat in Linux/Windows Environment.
Experienced in Shell scripting, automating using crontab.
Developed the Shell scripts for batch reports based on the given requirements.
Coding using Teradata Analytical functions, BTEQ SQL of Teradata, wrote UNIX scripts to validate, format and execute the SQLs.
Developed interactive dashboards, created various Ad hoc reports for users in Tableau by connecting various data sources.
Implemented Classification using Supervised learning like Logistic Regression, Decision trees, KNN, Naive Bayes.
Performed explorative data analytics and developed interactive dashboard using tableau
Involved in resolving defects found in testing and production support.
Wrote Sub Queries, Stored Procedures, Triggers, Cursors, and Functions on Oracle database.

Environment: Hadoop, Hive, Map Reduce, HDFS, SQOOP, HBase, Pig, Oozie, Java, Bash, My-SQL, Oracle, Windows and Linux.

Java Developer

Confidential

Responsibilities:

Gathered specifications from the requirements.
Developed the application using Struts MVC 2 architecture.
Developed JSP custom tags and Struts tags to support custom User Interfaces.
Developed front-end pages using JSP, HTML and CSS
Developed core Java classes for utility classes, business logic, and test cases
Developed SQL queries using MySQL and established connectivity
Used Stored Procedures for performing different database operations
Used JDBC for interacting with Database
Developed servlets for processing the request
Used Exception Handling for handling exceptions
Designed sequence diagrams and use case diagrams for proper implementation
Used Rational Rose for design and implementation

Environment: JSP, HTML, CSS, JavaScript, MySQL, JDBC, Servlets, Exception Handling, UML, Rational Rose.

Java Developer

Confidential

Responsibilities:

Involved in preparation of functional definition documents and Involved in the discussions with business users, testing team to finalize the technical design documents.
Enhanced the Web Application using Struts.
Created business logic and application in Struts Framework using JSP, and Servlets.
Documented the code using Java doc style comments.
Wrote Client-side validation using Struts Validate framework and JavaScript.
Wrote unit test cases for different modules and resolved the test findings.
Implemented SOAP using Web services to communicate with other systems.
Wrote JSPs, Servlets and deployed them on WebLogic Application server.
Developed automated Build files using Maven.
Used Subversion for version control and log4j for logging errors.
Wrote Oracle PL/SQL Stored procedures, triggers.
Helped production support team to solve trouble reports
Involved in Release Management and Deployment Process.

Environment: Java, J2EE, Struts, JSP, Servlets, JavaScript, Hibernate, SOAP, WebLogic, Log4j, Maven, CVS, PL/SQL, Oracle, Windows.

We provide IT Staff Augmentation Services!

Hadoop/big Data Developer Resume

South Portland, ME

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship