BigData Hadoop Developer Resume Jersey City, NJ - Hire IT People

SUMMARY:

Over 8+ years of IT experience in various domains with Hadoop Ecosystems and Java J2EE technologies.
Expertise in Hadoop Ecosystem components HDFS, Map Reduce, YARN, Hive, Pig, Sqoop, Hbase and Flume for Data Analytics.
Very good hands - on experience in Spark Core, Spark Sql, Spark Streaming and Spark machine learning modules using Scala and Python programming languages with good understanding of Driver, Executor Spark web UI.
Solid understanding of RDD and Dstream operations in Apache Spark i.e., Transformations & Actions , Persistence (Caching), Accumulators , Broadcast Variables , Optimising Broadcasts.
In depth understanding of Apache spark job execution Components like DAG , lineage graph, Dag Scheduler, Task Scheduler , Stages and Tasks .
Deep Knowledge in the core concepts of MapReduce Framework, Hadoop ecosystem and YARN.
Deployed Spark applications on various cluster modes like Spark Standalone , YARN and Mesos .
Worked on Spark REST APIs like Cluster API and Workspace API.
Migrated Python Machine learning modules to scalable, high performance and fault-tolerant distributed systems like Apache Spark.
Strong experience in Spark SQL UDFs , Hive UDFs , Spark SQL Performance, Performance Tuning. Hands on experience in working with input file formats like orc, parquet, json, avro .
Have a hands-on experience on fetching the live stream data from DB2 to HBase table using Spark Streaming and Apache Kafka.
Capable of processing large sets of structured, semi-structured and unstructured data sets.
Experience in using Flume to stream data into HDFS - from various sources .
Proficient in designing and querying the NoSQL databases like HBase, Cassandra.
Skilled on migrating the data from different databases to Hadoop HDFS and Hive using Sqoop.
Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
Expertise in writing Map-Reduce Jobs in Java for processing large sets of structured, semi-structured and unstructured data sets and store them in HDFS.
Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2, and MySQL.
Hands on Experience in Designing and Executing Test Cases to ensure application functionality using Selenium IDE, Selenium Web driver, Firebug and TestNG.
Good knowledge in Software Development Life Cycle (SDLC) and Software Testing Life Cycle (STLC).

TECHNICAL SKILLS:

Big Data Technologies: Hadoop, Spark, Hive, Kafka, AWS, HBase, Flume, Pig, Sqoop, Map Reduce, Impala, Oozie, Apache Zeppelin

Big data distribution: Hortonworks, Cloudera, Amazon EMR

Programming languages: Java, Scala, Python, SQL, PL/SQL, Shell Scripting

Operating Systems: Windows, Linux (Ubuntu)

Databases: Oracle, SQL Server, MySql, Cassandra

Designing Tools: Eclipse, Intellij

Java Technologies: JSP, J2EE, EJB, Servlets, Junit, Spring, Hibernate

Web Technologies: HTML, CSS, JavaScript, JQuery, AJAX

Web Services: RESTful, SOAP, RPC

Frameworks: Jakarta Struts 1.x, Spring 2.x, Hibernate

Development methodologies: Agile, Waterfall

Logging Tools: Log4j

Application/ Web Servers: Apache Tomcat, WebSphere, Weblogic

Messaging Services: Kafka

Version Tools: Git, SVN

PROFESSIONAL EXPERIENCE:

Confidential, Jersey City, NJ

BigData Hadoop Developer

Responsibilities:

Installed and configured HDFS, Hadoop Map Reduce, developed various Map Reduce jobs in Java for data cleaning and preprocessing.
Analyzed data using various low level spark API like RDDS, Dstreams using Scala, Python
Performed complex mathematical, statistical and machine learning analysis using SparkMlib, Spark Streaming.
Using Curator API on Elasticsearch to data back up and restoring.
Performed data ingestion from various data sources.
Worked with various types of databases like SQL, NOSQL and Relational for transferring data to and from HDFS.
Used Impala for data processing on top of Hive.
Understanding of data storage and retrieval techniques, ETL, and databases, to include graph stores, relational databases, tuple stores, NOSQL, Hadoop, PIG, MySQL and Oracle databases
Experience in using Avro, Parquet, RCFile and JSON file formats and developed UDFs using Hive and Pig.
Optimized MapReduce jobs to use HDFS efficiently by using Gzip, LZO, Snappy compression techniques.
Imported and exported data into HDFS and Hive using Sqoop.
Analysis the logs data and filter required columns by logstash configuration and send it to elastic search.
Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
Worked on Big Data Integration and Analytics based on Hadoop, SOLR, Spark, Kafka, Storm technologies.
Continuously monitored and managed Hadoop cluster using Cloudera Manager.
Performed POC’s using latest technologies like Spark, Kafka, Scala.
Created Hive tables, loaded them with data and wrote hive queries.
Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
Experience in managing and reviewing Hadoop log files.
Executed test scripts to support test driven development and continuous integration.
Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
Worked on tuning the Pig queries performance.
Expertise in workflow Scheduling and automation of code on a timely basis using Oozie .
Extensive Working knowledge of partitioned table, UDFs, performance tuning, compression-related properties, thrift server in Hive.

Environment: Cloudera 5.x Hadoop, Linux, IBM DB2, HDFS, Yarn, Impala, Pig, Hive, Sqoop, Spark, Scala, HBase, MapReduce, Hadoop Datalake.

Confidential, Cary, NC

BigData Hadoop Developer,

Responsibilities:

Used Pyspa rk dataframe to read text data, CSV data, image data from HDFS and Amazon S3.
Worked closely data scientist for building predictive model using Pyspark.
Developed Spark scripts by using Scala shell commands as per the requirement.
Developed Scala scripts, UDF s using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop .
Implemented Partitioning , Dynamic Partitions , Buckets in HIVE .
Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra .
Cleaned input text data and extracted features using Pyspark Machine learning feature exaction.
Trained model using historical data stored in HDFS and Amazon S3.
Used Spark Streaming to load the trained model to predict on real time data from kafka and stored the result in MongoDB .
Fully automated job scheduling, monitoring, and cluster management without human intervention using webflow.
Build apache spark as Web service using flask.
Migrated python scikit learn machine learning to data frame based spark machine learning algorithms.
Responsible for developing data pipeline with Amazon AWS to extract the data from weblogs and store in HDFS .

Environment: Spark core, SparkSQL, Spark streaming, Spark machine learning, Python, Scikit learn, Pandas dataframe AWS, Kafka, Hive, MongoDB, Github, Webflow, Amazon s3, Amazon EMR.

Confidential, IEden Prairie, MN

BigData Hadoop Developer-Health Care

Responsibilities:

Implemented Big Data platforms using Cloudera CDH4 as data storage, retrieval and processing systems.
Involved in installing, configuring and managing Hadoop Ecosystem components like Hive, Pig, Sqoop and Flume.
Responsible for loading data from Teradata database into a Hadoop Hive data warehousing layer, and performing data transformations using Hive.
Partitioned the collected people's data by disease type and medication prescribed to the patient to improve the query performance.
Used Hive Query language (HQL) to analyze the data to identify issues and behavioral patterns.
Configured Apache Sentry server to provide authentication to various users and provide authorization for accessing services that they were configured to use.
Worked on Kafka while dealing with raw data, by transforming into new Kafka topics for further consumption.
Installed and configured Hortonworks Sandbox as part of POC involving Kafka-Storm-HDFS data flow.
Log data from webservers across the environments is pushed into associated Kafka topic partitions, SparkSQL is used to calculate the most prevalent diseases in each city from this data.
Implemented Zookeeper for job synchronization
Deployed NoSQL database HBase to store the outputs of the jobs.
Supported operations team in Hadoop cluster maintenance activities including commissioning and decommissioning nodes and upgrades.
Extended the core functionality of Hive language by writing UDF's and UDAF'S
Worked on Oozie workflow engine for job scheduling.

Confidential

Hadoop Developer

Responsibilities:

Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
Installed and configured Apache Hadoop, Hive, and HBase.
Worked on Hortonworks cluster, which is responsible for providing open source platform based on Apache Hadoop for analysing, storing and managing big data.
Developed simple and complex MapReduce programs in Java for Data Analysis on different data formats.
Developed multiple map reduce jobs in java for data cleaning and processing.
Sqoop was used to pull data into Hadoop distributed file system from RDBMS and vice versa
Defined workflows using Oozie.
Used Hive to create partitions on hive tables and analyzes this data to compute various metrics for reporting.
Created Data model for Hive tables
Developed the LINUX shell scripts for creating the reports from Hive data.
Good Experience in managing and reviewing Hadoop log files
Used Pig as ETL tool to do transformations, joins and pre-aggregations before loading data onto HDFS.
Worked on large sets of structured, semi structured and unstructured data
Responsible to manage data coming from different sources
Installed and configured Hive and also developed Hive UDFs to extend core functionality of hive
Responsible for loading data from UNIX file systems to HDFS.

Environment: Apache Hadoop, Hortonworks, MapReduce, HDFS, Hive, HBase, Pig, Oozie, Linux, Java, Eclipse 3.0, Tomcat 4.1, MySQL.

Confidential

Java Developer

Responsibilities:

Worked with the business community to define business requirements and analyze the possible technical solutions.
Requirement gathering, Business Process flow, Business Process Modeling and Business Analysis.
Extensively used UML and Rational Rose for designing to develop various use cases, class diagrams and sequence diagrams.
Used JavaScript for client-side validations, and AJAX to create interactive front-end GUI.
Developed application using Spring MVC architecture.
Developed custom tags for table utility component
Used various Java, J2EE APIs including JDBC, XML, Servlets, and JSP.
Designed and implemented the UI using Java, HTML, JSP and JavaScript.
Designed and developed web pages using Servlets and JSPs and also used XML/XSL/XSLT as repository.
Involved in Java application testing and maintenance in development and production.
Involved in developing the customer form data tables. Maintaining the customer support and customer data from database tables in MySQL database.
Involved in mentoring specific projects in application of the new SDLC based on the Agile Unified Process, especially from the project management, requirements and architecture perspectives.
Designed and developed View, Model and Controller components implementing MVC Framework.

Environment: JDK 1.3, J2EE, JDBC, Servlets, JSP, XML, XSL, CSS, HTML, DHTML, JavaScript, UML, Eclipse 3.0, Tomcat 4.1, MySQL.

We provide IT Staff Augmentation Services!

Bigdata Hadoop Developer Resume

Jersey City, NJ

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship