Hadoop/Spark Developer Resume Sterling, VA - Hire IT People

SUMMARY:

Over 5 + years of professional IT experience which includes experience in Big data ecosystem and Java/J2EE related technologies.
Excellent Experience in Hadoop architecture and various components such as HDFS Job Tracker Task Tracker Name Node Data Node and Map Reduce programming paradigm.
Hands on experience in installing configuring and using Hadoop ecosystem components like Hadoop Map Reduce HDFS HBase Hive Sqoop Pig Zookeeper and Flume.
Good Exposure on Apache Hadoop Map Reduce programming PIG Scripting and Distribute Application and HDFS.
Good Knowledge on Hadoop Cluster architecture and monitoring the cluster.
In - depth understanding of Data Structure and Algorithms.
Experience in managing and reviewing Hadoop log files.
Strong backend experience using; Python, Scala, Hive QL, Spark SQL, etc
Excellent understanding and knowledge of NOSQL databases like MongoDB HBase Cassandra.
Implemented in setting up standards and processes for Hadoop based application design and implementation.
Developed Simple to complex Map/reduce streaming jobs using Python language that are implemented using Hive and Pig.
Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
Experience in Object Oriented Analysis Design OOAD and development of software using UML Methodology good knowledge of J2EE design patterns and Core Java design patterns.
Experience in managing Hadoop clusters using Cloudera Manager Tool.
Very good experience in complete project life cycle design development testing and implementation of Client Server and Web applications.
Experience in Administering Installation configuration troubleshooting Security Backup Performance Monitoring and Fine-tuning of Linux Redhat.
Extensive experience working in Oracle DB2 SQL Server and My SQL database.
Good hold on scripting including Shell/Perl and Python.
Scripting to deploy monitors checks and critical system admin functions automation.
Hands on experience in application development using Java RDBMS and Linux shell scripting.
Experience in Java JSP Servlets EJB WebLogic WebSphere Hibernate Spring JBoss JDBC RMI Java Script Ajax JQuery XML and HTML.
Ability to adapt to evolving technology strong sense of responsibility and accomplishment.

TECHNICAL SKILLS:

Programming Languages: Scala, Python, Java

Hadoop/Big Data: : HDFS, MapReduce, Spark, Yarn, Kafka, PIG, HIVE, Sqoop, Storm, Flume, Oozie, Impala, HBase, Hue, Zookeeper

NoSQL Technologies: : Cassandra, MongoDB, HBase

Big data Distribution: : Hortonworks, Cloudera, Amazon EMR cloud

JAVA/J2EE Technologies:: Servlets, JSP, JDBC, EJB, JAXB, JMS, JAX-RPC, JAX- WS, JAX-RS, Apache CFX.

Frameworks:: Struts, Spring, Hibernate.

Web Technologies:: HTML,CSS, JavaScript, jQuery, Ajax, Backbone.js, React, Node.js, Ext JS, Bootstrap.

Development Tools:: Eclipse, Net Beans, IBM RAD, IntelliJ, Spring tool Suite.

Databases:: MySQL, MS-SQL Server, IBM DB2, Oracle.

Operating Systems:: Windows XP/Vista/7/8, 10, UNIX, Linux, Mac OS.

Build Tools:: Ant, Gradle, Maven, Bower.

Web/ Application Servers:: WebSphere, Apache Tomcat, WebLogic, JBoss.

PROFESSIONAL EXPERIENCE:

Confidential, Sterling, VA

Hadoop/Spark Developer

Responsibilities:

Developed simple to complex Map Reduce streaming jobs using Java language for processing and validating the data
Developed data pipeline using Map Reduce, Flume, Sqoop to ingest customer behavioral data into HDFS for analysis
Migrated Map Reduce jobs to Spark jobs to discover trends in data usage by users
Implemented Spark using scala and Spark SQL for faster processing of data
Implemented algorithms for real time analysis in Spark
Imported data from AWS S3 in to Spark data frames, Performed transformations and actions on data frames
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
Real time streaming the data using Kafka with Spark
Used the Spark - Cassandra Connector to load data to and from Cassandra
Handled importing data from different data sources into HDFS using Sqoop and also performing transformations using Hive, Map Reduce and then loading data into HDFS.
Exported the analyzed data to the relational databases using Sqoop, to further visualize and generate reports for the BI team
Collecting and aggregating large amounts of log data using Flume and staging data in HDFS for further analysis
Analyzed the data by performing Hive queries (Hive QL)
Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting and contributed for performance tuning using Hive
Developed Hive scripts in Hive QL to de-normalize and aggregate the data
Created HBase tables and column families to store the user event data
Written automated HBase test cases for data quality checks using HBase command line tools
Scheduled and executed workflows in Oozie to run Hive and Pig jobs
Used Tez framework for building high performance jobs in Pig and Hive
Configured Kafka to read and write messages from external programs
Configured Kafka to handle real time data
Developed end to end data processing pipelines that begin with receiving data using distributed messaging systems Kafka through persistence of data into HBase
Continuous monitoring and managing the Hadoop cluster using Cloudera Manager
Developed interactive shell scripts for scheduling various data cleansing and data loading process

Environment: Hadoop, Spark, Map Reduce, Pig, Hive, Sqoop, Oozie, HBase, Zoo keeper, Kafka, Flume, Cloudera manager, AWS S3, MySQL, Cassandra, Multi-node cluster with Linux-Ubuntu, Windows, Unix.

Confidential, Westerville, OH

Hadoop And Spark Developer/Admin

Responsibilities:

Understanding business needs, analyzed functional specifications and mapped those in designing end to end data transformation pipelines.
Created Hive Tables, loaded data from Teradata using Sqoop.
Performed importing and exporting data into HDFS from Relational Databases and vice versa using Sqoop.
Extensively worked on importing metadata into Hive and migrated existing tables and applications to work on Hive and AWS cloud.
Implemented Hive Generic UDF’s to in corporate business logic into Hive Queries.
Extensively worked on Hive QL, join operations, writing custom UDF’s and having good experience in optimizing Hive Queries.
Wrote Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Designed and implemented Hive and Pig UDF's using Python for evaluation, filtering, loading and storing of data
Developed MR jobs for cleaning, validating and transforming the data.
Performed debugging, performance tuning using PIG and HIVE scripts by understanding the joins, group and aggregation between them.
Wrote Pig scripts to transform raw data from several data sources.
Used different columnar file formats (RC File, Parquet and ORC formats).
Used Cloud era manager to monitor workload, job performance and for capacity planning.
Took part in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
Performed data migration from Legacy Databases RDBMS to HDFS using Sqoop.
Hands on experience on whole ETL (Extract Transformation & Load) process.
ETL development to normalize this data and publish it in IMPALA
Worked along with BI teams in generating the reports and designing ETL workflows on Tableau.
Worked on NOSQL databases(HBase, MongoDB) for Hybrid implementations.
Used IMPALA to analyze data ingested into HBase and compute various metrics for reporting on the dashboard.
Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
Worked with the testing teams to fix bugs and ensure smooth and error-free code.
Involved in Agile methodologies, daily Scrum meetings, Sprint planning.

Environment: Hadoop, Map Reduce, HDFS, Hive, Python, Java, SQL, Cloudera Manager, Pig, Sqoop, Oozie, HBase, Zookeeper, MongoDB, PL/SQL, MySQL, DB2, Teradata.

Confidential, Durham, NC

Java/Hadoop Developer

Responsibilities:

Installed and configured Hadoop HDFS, Map Reduce, Pig, Hive, and Sqoop.
Involved in implementing High Availability and automatic failover infrastructure to overcome single point of failure for Name node utilizing zookeeper services.
Developing PIG scripts to transform the raw data into intelligent data as specified by business users.
Demonstrate proficiency in Shell, Python scripts for file validation and processing, job scheduling, distribution and automation
Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
Develop Spark apps in Java, Scala or Python
Used Spark for interactive queries, processing of streaming data and integration with popular NoSQL database for huge volume of data
Real time streaming the data using Kafka with Spark
Exported analyzed data to HDFS using Sqoop for generating reports.
Importing and exporting data into HDFS and Hive using Sqoop and Flume.
Worked on Oozie workflow engine to run multiple Map Reduce jobs.
Supported MapReduce Programs those are running on the cluster.
Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.

Environment: Hadoop, MapReduce, HDFS, Pig, Sqoop, Spark, Kafka Hive, Java, Oracle, Eclipse and Shell/Python Scripting.

Confidential

Hadoop Developer/Admin

Responsibilities:

Worked on analyzing, writing Hadoop MapReduce jobs using API, Pig and Hive.
Gathered the business requirements from the Business Partners and Subject Matter Experts.
Involved in installing Hadoop Ecosystem components under Cloudera distribution.
Responsible to manage data coming from different sources.
Supported MapReduce Programs those are running on the cluster.
Wrote MapReduce job using Java API for data Analysis and dim fact generations.
Installed and configured Pig and also written Pig Latin scripts.
Wrote MapReduce job using Pig Latin.
Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for better performance
Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
Developed Scripts and Batch Job to schedule various Hadoop Program.
Wrote Hive queries for data analysis to meet the business requirements.
Created Hive tables and working on them using Hive QL.
Utilized Agile Scrum Methodology to help manage and organize a team of 4 developers with regular code review sessions.
Used storm for an automatic mechanism for repeating attempts to download and manipulate the data when there is a hiccup.
Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
Used Storm to analyze large amounts of non-unique data points with low latency and high throughput.
Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Java, MapReduce, Spark, HDFS, Hive, Pig, Linux, XML, MySQL, MySQL Workbench, Java 6, Eclipse, PL/SQL, SQL connector, Sub Version.

Confidential

Java/Hadoop Developer

Responsibilities:

Review the requirement and analyze the impact.
Participated in the requirement analysis and design of the application using UML/Rational Rose and Agile methodology.
Involved in developed the application using Core Java, J2EE and JSP's.
Worked to develop this Web based application in J2EE framework which uses Hibernate for persistence, spring for Dependency Injection and Junit for testing.
Used JSP to develop the front-end screens of the application.
Designed and developed several SQL Scripts, Stored Procedures, Packages and Triggers for the Database.
Used Indexing techniques in the database procedures to obtain search results.
Involved in development of Web Service client to get client details from third party agencies.
Developed nightly batch jobs which involved interfacing with external third party state agencies.
Test scripts for performance and accessibility testing of the application are developed.
Involved in different types of testing like Unit, System, Integration testing etc. is carried out during the testing phase.
Provided production support to maintain the application.

Environment: Java, J2EE, Struts Frame work, JSP, Spring Framework, Hibernate, Oracle, Eclipse, Subversion, Oracle, PL/SQL, Web sphere UML, Windows.

We provide IT Staff Augmentation Services!

Hadoop/spark Developer Resume

Sterling, VA

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship