Hadoop/ Spark Developer Resume

SUMMARY

6+ years of experience as Application Developer and coding with analytical programming using Scala, PySpark, Django, Flask, AWS, GCP, SQL, Hadoop Map reduce, HDFS, HBase, Zookeeper, Hive, Sqoop, Pig, Flume, Cassandra, Kafka and Spark, NiFi.
Good Understanding of Hadoop architecture and Hands on experience with Hadoop components such as Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce concepts and HDFS Framework.
Hands of experience in GCP, Big Query, GCS bucket, G - cloud function, cloud dataflow, Pub/suB cloud shell, GSUTIL, BQ command line utilities, Data Proc, Stack driver
Well knowledge and experience in Cloudera ecosystem (HDFS, YARN, Hive, SQOOP, FLUME, HBASE, Oozie, Kafka, Pig), Data pipeline, data analysis and processing with hive SQL, IMPALA, SPARK, SPARK SQL.
Using Flume, Kafka and Spark streaming to ingest real time or near real time data in HDFS.
Analyzed data and provided insights with R Programming and Python Pandas
Hands on experience on architecting the ETL transformation layers and writing spark jobs to do the processing.
Good experience of software development in Python (libraries used: Beautiful Soup, Numpy, SciPy, Maplotlib, Pandas data frame, network, urllib2, MySQL dB for database connectivity) and IDEs - sublime text, Spyder, PyCharm.
Strong experience in analyzing large amounts of data sets writing PySpark scripts and Hive queries.
Used Log4j for logging and debugging.
Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
Having good experience in AWS Services (EC2, Lambda, EMR, S3, SNS etc).
Experience in pulling data from Amazon S3 cloud to HDFS.
Hands on experience in VPN Putty and WinSCP.
Experience in Data load management, importing & exporting data using SQOOP & FLUME.
Experience in analyzing data using Hive, Pig and custom MR programs in Java.
Experience in scheduling and monitoring jobs using Oozie and Zookeeper.
Experienced in writing Map Reduce programs & UDF's for both Pig & Hive in java.
Experience in dealing with log files to extract data and to copy into HDFS using flume.
Experience in integrating Hive and Hbase for effective operations.
Experience in Impala, Solr, MongoDB, HBase and Spark.
Expertise in Waterfall and Agile - SCRUM methodologies.
Proficient in Core Java, J2EE, JDBC, Servlets, JSP, Exception Handling, Multithreading, EJB, XML, HTML5, CSS3, JavaScript, AngularJS.
Experienced with code versioning and dependency management systems such as Git, SVT, and Maven, Bitbucket.
Used source debuggers and visual development environments.
Experience in Testing and documenting software for client applications.
Writing code to create single-threaded, multi-threaded or user interface event driven applications, either stand-alone and those which access servers or services.
Good experience in using Data Modelling techniques to find the results based on SQL and PL/SQL queries.
Good working knowledge on Spring Framework.
Successfully loaded files to Hive and HDFS from MongoDB, Cassandra, and HBase.
Experience working with different databases, such as Oracle, SQL Server, MySQL and writing stored procedures, functions, joins, and triggers for different Data Models.
Excellent communication skills, interpersonal skills, problem solving skills a very good team player along with a can-do attitude and ability to effectively communicate with all levels of the organization such as technical, management and customers.

TECHNICAL SKILLS

Big Data Technologies: HDFS, Hive, MapReduce, Pig, Sqoop, Flume, Oozie, Hadoop distribution, and Hbase, Spark, Yarn, Zookeeper.

Programming languages: Java (5,6,7), Python, Scala

Databases: MySQL, SQL/PL-SQL, MS-SQL Server 20012/16, Oracle 10g/11g/12c

Scripting/Web Languages: JavaScript, HTML5, CSS3, XML, SQL, Shell, Perl.

NoSql Databases: Cassandra, HBASE, mongoDB, ELASTIC SEARCH

Operating Systems: Linux, Windows XP/7/8/10, Mac.

Software Life Cycle: SDLC, Waterfall and Agile models

Cloud Technologies: Amazon EC2, S3, EMR, Dynamo DB.

Utilities/Tools: Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Alteryx, Visio.

Data Visualization Tolls: Tableau, SSRS.

AWS Services: EC2, S3, EMR, Lambda.

PROFESSIONAL EXPERIENCE

Confidential

Hadoop/ Spark Developer

Responsibilities:

Developed various Spark applications end-to-end data pipeline using Scala for generating required Import data from sources like HDFS/HBase into Spark RDD. structured data sets based on the business requirement
Usage of Spark Streaming and Spark SQL API to process the files.
Worked extensively with Sqoop for importing and exporting the data from HDFS to Relational Database systems/mainframe and vice-versa loading data into HDFS.
Developed Spark scripts by using Spark shell commands as per the requirement.
Design AWS architecture, Cloud migration, AWS EMR, Dynamo DB, Redshift and event processing using lambda function
Implemented Hive UDF's to implement business logic and Responsible for performing extensive data validation using Hive.
Worked on NOSQL Databases such as Hbase, also used SPARK for real time streaming of data into the cluster.
Handled importing of data from various data sources (i.e. Oracle, DB2 and MongoDB) to Hadoop, performed transformations using Hive, MapReduce.
Involved in writing shell scripts in order to load data into redshift tables from s3 buckets.
Developed analytical component using Scala, Spark and Spark Streaming.
Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
Installing and configuring a Hadoop Cluster on a different platform like Cloudera, Pivotal HD and AWS-EMR with other ecosystems like Sqoop, HBase, Hive, and Spark.
Developed Storm topology to ingest data from various source into Hadoop Data Lake.
Developed web application using HBase and Hive API to compare schema between HBase and Hive tables.
Used JVM monitor to monitor threads and memory usage of HBase and Hive schema check web application.
Implemented spark code in order to load various JSON and CSV files into HIVE tables.
Designed and implemented Spark jobs to support distributed processing using Pyspark, Hive, and Apache Pig.

Environment: Apache Hadoop, HDFS, Core Java, Sqoop, Spark, Scala, Sqoop, Kerberos, Jira, Hive, Shell/Perl Scripting, Python, Kafka, Oozie, Airflow, AWS EC2, S3, EMR, RDS, Cloudera, Linux.

Confidential, San Diego, California

Hadoop/Scala Developer

Responsibilities:

Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs and Scala.
Developed multiple POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
Provided support and design drafts for L1, L2 and L3 applications.
Analyzed the SQL scripts and designed the solution to implement using Scala.
Developed analytical component using Scala, Spark and Spark Stream
Used Scala collection framework to store and process the complex consumer information.
Used Scala functional programming concepts to develop business logic.
Importing and exporting data into HDFS Sqoop and Flume and Kafka.
Troubleshoot and debug Hadoop ecosystem run-time issues.
Implemented Spark using Scala and SparkSQL for faster testing and processing of data.
Involved in creating various hive external and internal tables in order to store data in Hive.
Wrote Pig scripts to run ETL jobs on the data in HDFS and further do testing.
Used Hive to do analysis on the data and identify different correlations.
Involved in HDFS maintenance and administering it through Hadoop-Java API.
Written Hive queries for data analysis to meet the business requirements.
Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
Worked with support teams and resolved operational & performance issues

Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Pig, Sqoop, ZooKeeper, Yarn, PL/SQL, MySQL, Hbase.

Confidential

Hadoop Developer

Responsibilities:

Written Map-Reduce code to process all the log files with rules defined in HDFS (as log files generated by different devices have different xml rules).
Developed and designed application to process data using Spark.
Provided support for L1 and L2 applications and played a major role in optimizing them.
Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
Developing MapReduce jobs, Hive & PIG scripts for Risk & Fraud Analytics platform.
Developed Data ingestion platform using Sqoop and Flume to ingest Twitter and Facebook data for Marketing & Offers platform.
Developed and designed automate process using shell scripting for data movement and purging.
Installation & Configuration Management of a small multi node Hadoop cluster.
Installation and configuration of other open source software like Pig, Hive, Flume, Sqoop.
Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
Importing and exporting data into Impala, HDFS and Hive using Sqoop.
Responsible to manage data coming from different sources.
Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
Developed Hive tables to transform, analyze the data in HDFS.
Moved all RDBMS data into flat files generated from various channels to HDFS for further processing.
Developed job workflows in Oozie to automate the tasks of loading the data into HDFS.
Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted data from Teradata into HDFS using Sqoop.
Writing the script files for processing data and loading to HDFS.

Environment: Hadoop, MapReduce, HDFS, Pig, Hive, Java (jdk1.7), SQL, PL/SQL, SQL*PLUS, Linux, Sqoop, Hive.

Confidential

Java Developer

Responsibilities:

Excellent JAVA, J2EE application development skills with strong experience in Object Oriented Analysis, Extensively involved throughout Software Development Life Cycle (SDLC
Implemented various J2EE standards and MVC framework involving the usage of Struts, JSP, AJAX and servlets for UI design.
Used SOAP/ REST for the data exchange between the backend and user interface.
Utilized Java and MySQL from day to day to debug and fix issues with client processes.
Developed, tested, and implemented financial-services application to bring multiple clients into standard database format.
Assisted in designing, building, and maintaining database to analyze life cycle of checking and debit transactions.
Created web service components using SOAP, XML and WSDL to receive XML messages and for the application of business logic.
Involved in configuring web sphere variables, queues, DSs, servers and deploying EAR into Servers.
Involved in developing the business Logic using Plain Old Java Objects (POJOs) and Session EJBs.
Developed authentication through LDAP by JNDI.
Developed and debugged the application using Eclipse IDE.
Involved in Hibernate mappings, configuration properties set up, creating sessions, transactions and second level cache set up.
Involved in backing up database & in creating dump files. And also creating DB schemas from dump files. Wrote developer test cases & executed. Prepared corresponding scope & traceability matrix.
Implemented JUnit and JAD for debugging and to develop test cases for all the modules.
Hands-on experience of Sun One Application Server, Web logic Application Server, Web Sphere Application Server, Web Sphere Portal Server, and J2EE application deployment technology.

Environment: Java multithreading, JDBC, Hibernate, Struts, Collections, Maven, Subversion, JUnit, SQL language, Struts, JSP, SOAP, Servlets, Spring, Hibernate, Junit, Oracle, XML, Putty and Eclipse.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship