Hadoop/Spark Developer Resume

SUMMARY

Extensively involved in all phases of Software development life cycle including Analysis, design, development, Implementation, testing and support.
Around 8 years of professional experience as softwaredeveloperin Design, Development, Deployment, and support of large scale distributed systems.
Around 4 years of experience in using various Bigdata tools such as Hadoop, Hortonworks, Spark, Mapreduce, Hive, Impala, Oozie, Sqoop, Pig, Zookeeper, Flume, Kafka, Tableau and unix/linux.
Around 1year experience in usingSpark,SparkStreaming for streaming data analysis.
Hands on experience withSparkusingSparkContext,Spark - SQL, Data Frame, Pair RDD's, Yarn and developedSparkcode using Python, as well asSpark-SQL/Streaming for faster Processing and Testing.
Extensive experience in Data Ingestion, Transformation, Analytics using ApacheSparkframework, and Hadoop ecosystem components.
Expert in working with Hive data warehouse tool - creating tables, data distribution by implementing partitioning and bucketing, writing and optimizing the HiveQL queries.
Experienced working with various file formats such as Avro, Parquet, ORC and JSON.
Strong experience in Hadoop development and Testing big data solutions using Hortonworks.
Experienced in using source code management tools like GIT, SVN.
Experienced in writing Down-Stream and Up-Stream Pipelines using Python OOP.
Used Kerberos authentication protocol to allow nodes communicating over a non-secure network to a secure network.
Experience using Kafka cluster for Data Integration and secured cloud service platform like AWS and doing Data summarization, Querying and Analysis of large Datasets stored on HDFS and Amazon S3 filesystem using Hive Query Language (HiveQL)
Experienced in importing and exporting data using Sqoop from HDFS to RDBMS and vice versa.
Extensive usage of Spark Core and Spark SQL with Scala and Python.
Regular use of authentication protocols like Kerberos.
Loaded huge data into Spark RDD and do in memory data Computation to generate the Output response.
Exposure to Waterfall and Agile Software methodologies.
Expert in Problem solving, excellent analytical, troubleshooting and debugging skills
Worked on Apache FLUME distributed service
Developed Map-reduce programs and libraries using Java-8, and extracted several abstract methods.

PROFESSIONAL EXPERIENCE

Confidential

Hadoop/Spark Developer

Responsibilities:

Involved in writing spark with Scala applications for ETL and data analysis.
Worked extensively onSparkStreaming,SparkSQL, Dataframes, Hive, Impala, sqoop, Flume, Kafka, Scala, Python and Java.
Developed multiple POC’s using PySpark where we code using Python. Deployed them on Yarn Cluster and compared the performance ofSparkSQL with Hive/Impala and SQL/Teradata.
Developed PIG scripts for source data validation and transformation. Worked on Oozie to automate data loading into the HDFS and Pig for pre-processing the data.
Good understanding on various data formats like Avro, Sequence File, JSON, Parquet and XML file formats.
Worked on cleansing data generated from weblogs with automated scripts in Python.
Loading, analyzing, and extracting data to and from Oracle database using sqoop.
Designed and implemented an ETL framework using Java to load data from multiple sources into Hive and from Hive to Vertica.
Worked on converting Hive/SQL queries intoSparktransformations usingSparkRDDs, Python, and OOP with Python. Worked on developing and executing shell scripts to automate the jobs.
Experience in working with SQL, hql, Sparksql and shellscripts, views, indexes, stored procedures, and other components of database applications.
Worked on Sqoop to import disk failures data from oracle into HDFS for disk failure analysis.
Created Hive tables, loaded with data, and wrote Hive queries to process the data. Created Partitions and used Bucketing on Hive tables and used required parameters to improve performance. Developed Pig and Hive UDFs as per business use-cases.
Built streaming data to perform real time analytics usingSparkstreaming and Kafka (producer and consumers) and prevail the analyzed data to NoSQL DB - HBase.
Worked onSparkData sources,SparkData frames,SparkSQL and Streaming using Scala.
Designed as well as published visually rich and intuitive Tableau Dashboards and Crystal Reports for executive decision-making.

Environment: Hadoop, Hortonworks, MapReduce, Spark, Hive, Impala, Pig, Sqoop,Flume, Kafka, Zookeeper, HBase, Sql, Impala, Oozie, Scala, Java, Python, Tableau, UNIX.

Confidential, Columbus, OH

Hadoop Developer/Spark Developer

Responsibilities:

More than two years of experience in installing, configuring and testing Big Data tools.
Creating end to end Spark applications using Scala to perform various data cleansing.
Involved in converting Hive/SQL queries into Spark transformations using Spark RDD'S and Scala.
Developed Spark scripts by using Scala shell commands as per the requirement.
Used Akka as a framework to create reactive, distributed, parallel and resilient concurrent applications in Scala.
Used Slick to query and storing in database in a Scala fashion using the powerful Scala collection framework.
Developed POC using Scala deployed on Yarn cluster, compared the performance of Sparksql, with Hive and Impala.
Performed advanced procedures like text analytics, using in-memory computing capabilities of Spark using Scala.
Implemented Spark using Scala and utilizing Data frames and Spark SQL API for faster processing of data.
Implemented the Databricks API in Scala program to push the processed data to Redshift DB. Redshift is columnar and compressed storage, scale linearly and seamlessly.
Worked on the performance tuning of spark data frames for aggregation using dynamic partition, creating the temp views needed.
POC on Migrating Map Reduce programs into Spark transformations using Spark SQL and Scala.
Worked on Spark Streaming and Spark SQL to run applications on Hadoop.

Environment: Hadoop, Hortonworks, Spark, Hive, Impala, Pig, Sqoop, Zookeeper, HBase, sql, Impala, Oozie, Solr, Java, ETL, Tableau, UNIX.

Confidential, Charlotte, NC

Hadoop Developer

Responsibilities:

Analyzed Business requirements of Big Data and transformed into Hadoop centric technologies.
Worked on Hadoop(Hortonworks) cluster of range 25 nodes.
Worked on Data import and export from Oracle into HDFS and Hive using Sqoop.
Implemented custom UDF's for Hive to achieve comprehensive data analysis.
Built H-Base to implement in-memory operation and bloom filters on per-column basis.
Worked on streaming log data into HDFS from web servers using Flume.
Used python to embed several applications. Worked on various standard libraries in Python.
To filter data as per requirement we implemented custom interceptors for flume.
To identify issues and behavioral patterns we used Hive and Pig to analyze data in HDFS.
Extensively used PIG to communicate with Hive using HCatalog and H-BASE using Handlers.
For optimized performance, we defined static and dynamic partitions and created internal and external Hive tables.
Created Hive tables to store the processed results in a tabular format.
Extraction, processing and analysis of data is configured on daily workflow using Oozie Scheduler.
Designed and implemented MapReduce-based on large-scale, parallel and relation-learning system.
Worked on Shell scripts and involved in performance analysis of the application and fixed problems/suggest solutions.

Confidential, Danbury, CT

Java/J2EE Developer

Responsibilities:

Involved in all phases of Software Development Life Cycle (SDLC).
Used CVS for version control system and Test Director for bug tracking
Involved in developing applications using Java, J2EE, EJB, Struts, JSP and Servlet
Created the UI validations using Struts validation framework
Strategize and develop enhancements to support the migration process
User Training-worked with user community closely to train them and explain various features to them.
Developed database schema and SQL queries for querying database on Oracle 9i
Developed web applications in Social Tango using Core Java, J2EE, Multithreading, JDBC using Java API’s for application development
Worked as Full stack developer using HTML, CSS, Javascript, Django, AngularJS, Bootstrap, Python etc.,
Used JavaScript and XML to update a portion of a web page thus reducing bandwidth usage, load time and add model dialog in web pages to get user input and requests.
Worked on Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
Involved in design of JSP's and Servlets for navigation among the modules.
Implemented the Security Access Control both on client and Server side. Applet signing including Jar signing.
Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures
Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert user’s data into multiple database schemas.

Environment: Java, J2EE, JSP, HTML, Java Script, Oracle, SQL, JDBC, XML, Servlet, IBM Web sphere, ANT, C++, SQL server

Confidential

Java/J2EE Developer

Responsibilities:

Involved in all phases of Software Development Life Cycle (SDLC).
Designed a social networking website called Social Tango internally only for the organization.
Developed web applications in Social Tango using Core Java, J2EE, Multithreading, JDBC using Java API’s for application development
Worked as Full stack developer using HTML, CSS, Javascript, Django, AngularJS, Bootstrap, Python etc.,
Used JavaScript and XML to update a portion of a web page thus reducing bandwidth usage, load time and add model dialog in web pages to get user input and requests.
Worked on Asynchronous JavaScript and XML (AJAX) for better and faster interactive Front-End.
Involved in design of JSP's and Servlets for navigation among the modules.
Implemented the Security Access Control both on client and Server side. Applet signing including Jar signing.
Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures
Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert users data into multiple database schemas.
Had to set up Apache and Nginx webservers to manage the contents on the networking site
Worked on Relational (such as MySQL, PostgreSQL) and non-relational databases (like MongoDB, Redis or Cassandra) with XML / JSON.
For front end development, had to work with media queries and CSS preprocessors like LESS and SASS to design proper responsive webpages.
Had to work with web development tools like Vagrant, Docker using Ruby and shell scripts

Environment: Java, Java Swing JSP, Servlets, JDBC, Applets, Servlets, JCE 1.2, RMI, EJB, XML/XSL, Visual Age java (VAJ), Visual C++, J2EE.

Confidential

Java Developer

Responsibilities:

Developed Master screens by design architecture using J2EE MVC framework.
Developed interfaces using HTML, JSP pages and Struts -Presentation View.
Developed Struts Framework and configuring web.xml and struts-config.xml according to the struts framework.
Developed and implemented Servlets running under JBoss.
Used J2EE design patterns and Data Access Object (DAO) for the business tier and integration Tier layer of the project; Developed Java UI using swing.
Used Java Message Service (JMS) for reliable and asynchronous exchange of valuable information between the clients and the customer
Designed and developed Message driven beans that consumed the messages from the Java message queue.
Development of database interaction code to JDBCAPI making extensive use of SQL Query Statements and advanced prepared statement.
Taken care of complete Java multi-threading part in back end components.
Inspection/Review of quality deliverables such as Design Documents.
Wrote SQL Scripts, Stored procedures and SQL Loader to load reference data.
Used Ant building tool to build the application.

Environment: Oracle 9i/8i, SQL, PL/SQL, Perl, SQL*Loader, SQL*Plus, Windows 2000, Windows 7/XP.

We provide IT Staff Augmentation Services!

We'd love your feedback!

Resume Categories

Client Services

Job Seekers

Visa Sponsorship