Hadoop developer Resume New York City, NY - Hire IT People

SUMMARY

7+ years of experience in Software development which includes 4 years of experience in Big data and Hadoop Ecosystem components and 3 years in Java development.
Proficiency in Java, Hadoop Map Reduce, Pig, Hive, Oozie, Sqoop, HBase, Scala, Python, Kafka, NoSQL Databases and AWS.
Experience in installation, configuration, supporting and managingHadoop Clusters using Hortonworks, and Cloudera (CDH3, CDH4,CDH5) distributions on Amazon web services (AWS)
Configured and deployed Cloudera distribution Multi - node Hadoop cluster on Amazon Ec2 instances, pseudo-distributed cluster in local Linux machines for Proof of concepts (POC).
Experienced in building highly reliable, scalable Big-data solutions on Hadoop distributions Cloudera, Horton works, AWS EMR.
Expertise in Developing Spark application using Spark Core, Spark SQL and Spark Streaming API's in Scala and Python, deploying in yarn cluster in client, cluster mode using spark-submit.
Involved in creating, transforming and actions on RDDs, Data Frames, Datasets using Scala, Python and integrating the applications to Spark framework using SBT and MAVEN build automation tools.
Experience in using D-Streams in streaming, Accumulator, Broad cast variables, various levels of caching.
Deep understanding of performance tuning, partitioning for optimizing spark applications.
Experience in streaming data ingestion using Flume, Kafka and stream processing platforms like Apache Storm and Spark Streaming
In-depth understanding of NoSQL databases such as HBase, Casandra, MongoDB and its Integration with Hadoop cluster.
Experience in Apache Flume for collecting, aggregating and moving huge chunks of data from various sources such as webserver, telnet sourcesetc.
Used Cassandra CQL with Java APIs to retrieve data from Cassandra tables.
Strong working experience in extracting, wrangling, ingestion, processing, storing, querying and analyzing structured, semi-structured and unstructured data.
Good understanding of Hadoop MRV1 and Hadoop MRV2 (or) YARN Architecture.
Developed, deployed and supported several Map Reduce applications in Java to handle semi and unstructured data.
Very good Knowledge in Map side join, Reducer side join, Shuffle & Sort, Distributed Cache, Compression techniques, Multiple Hadoop Input & output formats.
Experience in working with csv, text, sequential, avro, parquet, orc, json formats of data.
Expertise in working with Hive data warehouse tool - creating tables, data distribution by implementing static and dynamic partitioning, bucketing and optimizing the HiveQL queries.
Expertise in moving structured schema data between Pig and Hive using HCatalog.
Designed and implemented Hive and Pig UDF's using Python, java for evaluation, filtering, loading and storing of data.
Experience in migrating the data using Sqoop from HDFS and Hive to Relational Database System and vice-versa according to client's requirement.
Experience in managing and monitoring of apache Hadoop clusters by Ambari.
Experience with RDBMS like SQL Server, MySQL, Oracle and data warehouses like Teradata and Netezza.
Good Knowledge in Amazon AWS computing like EC2webservices which provides fast and efficient processing of Big Data.
Experiencein job workflow scheduling and monitoring tools like Oozie.
Proficient knowledge and hands on experience in writing shell scripts in Linux.
Developed core modules in large cross-platform applications using JAVA, JSP, Servlets, Hibernate, RESTful, JDBC, JavaScript, XML, and HTML.
Extensive experience in developing and deploying applications using WebLogic, ApacheTomcat and JBOSS.
Development experience with RDBMS, including writing SQL queries, views, stored procedure, triggers, etc.
Strong understanding of Software Development Lifecycle (SDLC) and various methodologies (Waterfall, Agile).

TECHNICAL SKILLS

BigData Technologies: HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Ambari, Zookeeper, Kafka, Apache Spark.

Hadoop Distributions: Cloudera, Horton Works, Apache, AWS EMR

Languages: C, Java, PL/SQL, Python, PigLatin, HiveQL, Scala

IDE Tools: Eclipse, NetBeans, IntelliJ.

Web Technologies: HTML, CSS, JavaScript, XML, JSP, RESTful.

Operating Systems: Windows (XP,7,8,10), UNIX, LINUX, Ubuntu, CentOS.

Reporting Tools /ETL Tools: Tableau, Power view for Microsoft Excel, Talend.

Databases: Oracle, SQL Server, MySQL, MS Access, NoSQL Database (Hbase, Cassandra, MongoDB)

Build Automation tools: SBT, Ant, Maven

PROFESSIONAL EXPERIENCE

Confidential - New York City, NY

Hadoop developer

Responsibilities:

Involved in building scalable distributed data solutions using Spark and HDP.
Explored Spark framework for improving the performance and optimization of the existing algorithms in Hadoop using Spark Core, Spark SQL and Spark Streaming APIs.
Ingested data from relational databases to HDFS on regular basis using Sqoop incremental import.
Involved in Development of SparkScala applications to process and analyze text data from emails, complaints, forums, and clickstreams to achieve comprehensive customer care.
Extracted structured data from multiple relational data sources as Data Frames in Spark SQL.
Involved in schema extraction from file formats like Avro, Parquet.
Involved in converting the data from Avro format to Parquet format and vice versa.
Transformed the Data Frames as per the requirements of data science team.
Worked on the integration of Kafka service for stream processing, website tracking, log aggregation.
Involved in configuring and developing Kafka producers, consumers, topics, brokers using java.
Involved in data modeling, ingesting data into Cassandra using CQL, java APIs and other drivers.
Implemented CRUD operations using CQL on top of Cassandra file system.
Involved in writing Pig Scripts to wrangle the log data and store it back to HDFS and Hive tables.
Involved in accessing the hive tables using Hive Context and transform the data and store it to HBase.
Involved in creating Hive tables from wide range of data formats like text, sequential, Avro, Parquet, ORC.
Analyze the transactional data in HDFS using Hive and optimizing the performance of the queries by segregating the data using clustering and partitioning.
Developed Spark Applications for various business logics using Scala, Python.
Involved in moving the data between HDFS and AWSS3 by using apache distCp.
Involved in pulling the data from AmazonS3data lake and built Hive tables using Hive Context in Spark
Involved in running hive queries and spark jobs on data stored inS3.
Run short term ad-hoc queries, jobs on the data stored on S3 using AWSEMR.

Environment: HDP Hadoop, HDFS, Pig 0.15.0, Hive 2.2.0, Kafka 1.1.0, Sqoop, Shell Scripting, Spark 2.0, AWS S3, Linux- Cent OS, HBase, Map Reduce, Scala 2.10.4, Eclipse 4.6.

Confidential - Fort Myers, FL

Hadoop Developer

Responsibilities:

Worked closely with Business Analysts to gather requirements and design a reliable and scalable distributed solutions using Horton works distributed Hadoop.
Ingested structured data from MySql, SQL Server to HDFS as incremental import using Sqoop. These imports are scheduled to run in a periodic manner.
Used HCatalog to move structured data between Pig relation and Hive.
Involved in developing Spark Scala applications using Core Spark, Spark SQL APIs.
Involved in POC to check the efficiency of Spark application on Mesos cluster.
Developed and implemented workflows using Apache Oozie for tasks automation.
Worked with Sqoop from HDFS and Hive to Relational Database System and vice-versa according to requirement.
Responsible for creating Hive tables, loading the structured data resulted from Map Reduce jobs into the tables and writing hive queries to further analyze the logs to identify issues and behavioral patterns.
Implemented schema extraction for Parquet and Avro file Formats in Hive.
Worked with CBO, Vectorization and other Hive optimization to improve the performance of the query.
Used Tableau to connect with HiveServer2for generating daily reports of customer purchases.
Involved in working with data formats like CSV, text, sequential, Avro, Parquet, ORC, JSON and customized Hadoop formats.
Exported processed data in HDFS to DWH using Sqoop export through a staging table.

Environment: HDFS, Spark 1.6, 1.5.2, Eclipse 4.6, AWS, Map Reduce, Hive 1.2.0, Pig0.15.0, Java, SQL, Sqoop 1.4.6, Linux-Centos, Zookeeper 3.4.8, Hbase, Maven.

Confidential - Rochester, NY

Hadoop/Spark developer

Responsibilities:

Developed Spark Applications by using Spark, Java and Implemented Apache Spark data processing project to handle data from various RDBMS and Streaming sources.
Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
Used Spark API over Cloudera Hadoop YARN to perform analytics on data in HDFS.
Used Spark SQL to Load JSON data and create Schema RDD and loaded it into Hive Tables and handled structured data using Spark SQL.
Developed Spark Programs using Spark and Java API's and performed transformations and actions on RDD's.
Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
Used Spark and Spark SQL to read the parquet data and create the tables in hive using the Scala API.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/Map Reduce in Spark 1.6 for data Aggregation, queries and writing data back into OLTP system through Sqoop.
Experienced in handling large datasets using Partitions, Spark in Memory capabilities, Broadcasts in Spark, effective & efficient Joins, Transformations and other during ingestion process itself.
Involved in writing live Real-time Processing and core jobs using Spark Streaming with Kafka as a data pipe-line system.
Worked on streaming pipeline that uses Spark to read data from Kafka, transform it and write it to HDFS.
Analyzed the weblog data using the HiveQL, integrated Oozie with the rest of the Hadoop stack Utilized cluster co-ordination services through Zookeeper.

Environment: Cloudera( CDH3), Sqoop 1.4.6, Scala 2.11.8, Spark, Kafka 1.1.0, Spark SQL, Spark Streaming, Java (JDK1.6), Hive 1.2.1, Elastic Search, GIT Repository, AWS, Spark cluster, Hadoop Framework, DB2.

Confidential

Java Developer

Responsibilities:

Responsible for requirement gathering and analysis through interaction with end users.
Involved in designing use-case diagrams, class diagram, interaction using UML.
Involved in designing and development of applications that satisfy the dynamic business needs.
Developed the front-end of the application using HTML, CSS, JavaScript
Used JavaScriptto perform client-side validations and servlets for server-side validation.
Developed Web applications with Rich Internet applications using Java applets, JavaFX.
Involved in creating Database SQLqueries and stored Procedures. Implemented Singleton classes for property loading and static data from MicrosoftSQLserver.
Worked with QA team to conduct integrated (application and database) stress testing, performance analysis and tuning.
Worked on Maven build tool to deploy the application on the WebLogicapplicationserver.
Interacting with the client regarding project status, new design proposals and handling technical issues related to the system development and maintenance.
Providing technical expertise on new design approaches to improve the maintenance and performance of the application.

Environment: Java 1.6, J2EE, Eclipse, Servlets, spring, JSP, JavaScript, HTML, JDBC, SQL, Microsoft SQL Server 2008, UNIX, XML, BEA WebLogic.

Confidential

Java Developer

Responsibilities:

Involved in Analysis, design and development of web applications based on J2EE.
Strutsframework is used for managing the navigation and page flow.
Developed the EJB-Session Bean acts as Facade, will be able to access the business entities through their local home interfaces.
Designed the user interface using HTML, CSS, javaScript and JQuery.
Experience in writing secure and optimized code.
Used Log4j to debug and generate new logs for the application.
Used JDBC for accessing the data from the Oracle database. Created database tables, stored procedures using PL/SQL in OracleDB.
Validation on Web Forms, for client-side validation as per the requirement.
Performed Unit Tests on the application to verify and identify various scenarios.
Used Eclipse for development, Testing, and Code Review.
Involved in the release management process to QA/UAT/Production regions.
Used Maven tool for building application EAR for deploying on WebLogic Application servers.
Developed of the project in the agile environment.

Environment: J2EE, Java, Eclipse 3.6, EJB, Java Beans, JDBC, JSP, Struts, Design Patterns, BEA WebLogic, PL/SQL, DB2, UML, CVS, JUnit, Log4j.

We provide IT Staff Augmentation Services!

Hadoop Developer Resume

New York City, NY

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship