HADOOP/ SPARK DEVELOPER Resume Cleveland, OH - Hire IT People

PROFESSIONAL SUMMARY:

Overall 8 years of total IT experience in all phases of software development life cycle, 5 years of experience in Hadoop and Big Data Eco System.
Great Experience and knowledge in Hadoop architecture and various components such as HDFS, YARN, Job tracker, Task Tracker, Name Node, Data Node and MapReduce.
Good experience in Hadoop ecosystem like Hadoop MapReduce, HDFS, NIFI, Oozie, Hive, Sqoop, Pig, Zookeeper, Flume, Spark streaming, Spark SQL, HBase and Cassandra.
Expertise in Hadoop 2.0 and YARN architecture.
Experience in using Hadoop cluster using Cloudera’s CDH, Horton works HDP.
Expertise in writing Hadoop Jobs for analyzing data using MapReduce, Hive and Pig.
Experience in importing and exporting data using Sqoop from HDFS to Relational DatabaseSystems (RDBMS) and vice - versa.
Developed and implemented Apache NIFI across various environments, written QA scripts in Python for tracking files.
Expertise in writing custom UDF’s and UDAF’s for extending Hive and Pig core functionalities.
Experience in implementation of various Hadoop file-formats and compression techniques like Sequence, Parquet, ORC, Avro, Z-Zip and Text file.
Experienced in using NoSQL data bases like HBase, Cassandra, MongoDB.
Experience in working with different Databases like Oracle, MySQL, MS SQL.
Experience in writing UNIX, SHELL and BASH scripts.
Good experience in implementing advanced procedures like text analytics and processing the in-memory computing capabilities with Apache Impala, Scala.
Experience in creating RDD, Data frames for the required data and did transformations using Spark RDD’s, Spark SQL
Used Spark Structured Streaming to perform necessary transformations.
Experience in Writing Producers/Consumers and creating messaging centric applications using Apache Kafka.
Hands on experience in Amazon Web Services (AWS) provisioning tools likeEC2, Simple Storage Service (S3), Elastic Map Reduce.
Extensive Experience in Java development skills using J2SE, J2EE technologies like Servlets, Spring Hibernate, JSP, JDBC.
Experienced in Java components like Frame work collection, Exception handling, Multithreading and I/O system.
Experience in SOA using Soap and Restful.
Experience in working with Waterfall & Agile development methodology.
Proficiency in developing secure enterprise Java applications using technologies such as X-Servlets, Maven, Hibernate, XML, HTML, CSS Version Control Systems.
Ability to learn and adapt quickly to new tools and environment with strong communication and analytical skills.

TECHNICAL SKILLS:

Big Data Eco Systems: Hadoop (HDFS & Map Reduce), PIG, HIVE, HBASE, Zoo Keeper, Sqoop, Flume, Kafka, Apache Spark, Impala, Oozie.

Databases: Oracle, SQL server, My SQL.

No SQL Databases: HBase, Cassandra, Mongo DB.

Hadoop Distributions: Cloudera, Horton works.

Cloud: AWS, AZURE.

Languages: Java, Java SE, Java J2EE, Scala, Python, C.

Web Technologies: JavaScript, J-Query, Boot Strap, AJAX, XML,CSS, HTML, AngularJS.

Web Services: REST, SOAP, JAX-WS, JAX-RPC, JAX-RS, WSDL, Axis2, Apache HTTP, CVS, SVN.

IDE: Eclipse, Net beans, IntelliJ.

Operating Systems: MacOS, Linux, Windows.

PROFESSIONAL EXPERIENCE:

HADOOP/ SPARK DEVELOPER

Confidential, Cleveland, OH

Responsibilities:

Responsible for building scalable distributed data solutions using Hadoop.
ETL - Data cleansing, Transformation and prepping data ready for reporting tools.
Developed Spark jobs and Hive Jobs to apply rules, logics and transform data.
Involved in converting Hive/SQL queries into Spark transformations using Spark data frames, Scala.
Expertise in implementing Spark Scala application using higher order functions for both batch and interactive analysis requirement.
Used Spark Structured Streaming to perform transformations in data lake which gets data from Kafka and send to HDFS.
Created a Spark Streaming task to import live data from Kafka sources and implemented analysis models.
Responsible for handling large datasets using repartition, coalesce,broadcast variables and spark’s in-memory capabilities.
Converted row-like regular hive external tables into columnar snappy compressed parquet tables with key-value pairs. Also worked on other file formats like CSV and Text formats.
Implemented Hashing algorithms like UUID, MD5 for checksum and identifying delta.
Applied transformations on data ingested by Informatica team as per business requirements.
Used JDBC connectors to access reference tables and lookup-tables from Oracle RDBMS Tables.
Written Ad-hoc queries in hive for orchestration and unit testing.
Created and scheduled Control-M jobs to run multiple Hive and Spark Jobs, which independently run with time and data availability.
Implemented the work flows using Apache Oozie frame work to automate tasks.
Built on-premise end-to-end data pipelines.
Assisted in setting up Amazon EMR cluster, adding roles in Amazon IAM for Disaster Recovery (DR) Cluster.
Created business ready Views on top of Master Table and replicated data into Amazon S3.
Created reports in TABLEAU for visualization of the data sets created and tested native Drill, Impala and Spark connectors.
Used JIRA for task/Defect tracking, SVN for version control.

Environment: Hadoop, Cloudera, HDFS, Hive, Oozie, SparkSQL, Sqoop, Control-M, Scala, Informatica, Tableau, Shell Scripting, Python, Oracle, AWS.

HADOOP DEVELOPER

Confidential, Minneapolis, MN

Responsibilities:

Interacted with the Business users to identify the process metrics and various key dimensions and measures and involved in the complete life cycle of the project.
Responsible for writing Hive Queries for analyzing data in Hive warehouse using Hive Query Language (HQL).
Developed Map Reduce jobs in Java for data cleaning and preprocessing.
Good knowledge in using Apache NIFI to automate the data movement.
Used Map Reduce to ingest customer behavioral data and financial histories into HDFS.
Used Pig as ETL tool for transforming and pre-aggregations before storing data into HDFS.
Responsible for defining the data flow within Hadoop eco system and direct the team in implement them and exported the result set from Hive to MySQL using Shell scripts.
Handled importing of data from various data sources, performed transformations.
Involved in creating tables, partitioning, bucketing of table.
Configured Flume agents on different data sources to capture the streaming log data.
Implemented usage of Amazon EMR for processing Big Data across Hadoop cluster in virtual servers in EC2 and S3.
Experience with different data formats like Avro, Parquet, ORC and compressions like Snappy and Z-zip.
Implemented POC in persisting click stream data with Apache Kafka.
Optimized existing algorithms in Hadoop using Spark SQL.
Troubleshooting and solving migration issues and production issues.

Environment: Hadoop, HDFS, Hive, Sqoop, Java, Spark, AWS, Horton works, Kafka, Cassandra, UNIX, Tableau.

HADOOP DEVELOPER

Confidential, New York City, NY

Responsibilities:

Importing data using Sqoop into HDFS vice versa.
Worked on loading and transformation of large sets of structured, semi structured and unstructured data into Hadoop System.
Responsible to manage data coming from different data sources.
Developed simple and complex MapReduce programs in Java for Data Analysis.
Load data from various data sources into HDFS using Flume.
Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
Developed Java MapReduce programs for the analysis of sample log file stored in cluster.
Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
Responsible for spooling data from DB2 sources to HDFS using Sqoop.
Created HIVE tables and provided analytical queries for business user analysis
Extensive knowledge on PIG scripts using bags and tuples.
Created tables in HIVE by partitioning and bucketing for granularity and optimization of HIVEQL.
Involved in identifying job dependencies to design workflow for Oozie and resource management for YARN.
Capturing data from existing databases that provide SQL interfaces using Sqoop.
Involved in loading data from UNIX file system to HDFS.
Installed and configured Pig, Hive and written Pig and Hive UDFs.
Involved in creating Hive tables, loading with data and writing Hive queries which will run internally in map way.

Environment: Cloudera, HBase, Java, Hive, Pig, Sqoop, Oozie, Oracle, SVN, Kafka, GitHub, JIRA, Talend.

JAVA DEVELOPER

Confidential

Responsibilities:

Extensively involved in different stages of Agile Development Cycle including Detailed Analysis, Design, Develop and Test.
Implemented the Back-End Business Logic using Core Java technologies including Collections, Generics, Exception Handling, Java Reflection and Java I/O.
Wrote and specified Spring Annotation Configuration to define Beans and View Resolutions to configure Spring beans, dependencies and the services needed by beans.
Used Spring IC to implement dynamic dependency injection and Spring AOP to implement crosscutting concerns such as transaction management.
Wrote Mapping Configuration files to implement ORM Mappings in the Persistence Layer.
Using Hibernate DAO support extended Dao Implementation.
Hibernate Configuration files were written to connect Oracle database and fetch data.
The Hibernate Query Cache was implemented using EhCache to improve the performance.
Implemented web services with RESTful standards with the support of JAX-RS APIs.
Confirmation of registration and monthly statements are sent to users by integrating and implementing JavaMail API.
Manipulated database data with SQL queries, including setting up stored procedures and triggers.
Implemented front-end developments such as webpages design, data binding, Single-Page Applications using HTML/CSS, JavaScript, jQuery and AJAX.
Used jQuery libraries to simplify the frontend programming works. Performed users' input validation using JavaScript and jQuery.
Utilized Node.js and MongoDB to generate tendency charts of the application for Payment History.
Performed JUnit test cases to test the service layers of the application.
Used JIRA to track the projects and GIT to ensure version control.

Environment: Java, Spring, JavaMail, JavaScript, HTML, CSS, AJAX, jQuery, Junit, JIRA, Oracle DB, MongoDB, GIT.

We provide IT Staff Augmentation Services!

Hadoop/ Spark Developer Resume

Cleveland, OH

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship