We provide IT Staff Augmentation Services!

Spark Developer Resume

4.00/5 (Submit Your Rating)

Chicago, IL

SUMMARY:

  • Over 6 years of extensive IT experience, with extensive working experience in Hadoop Eco System and Big Data Development with expertise in MapReduce, HDFS, YARN, HIVE, HIVE Query Language, Pig, Impala, Zookeeper, Oozie, Apache Spark and Zeppelin and broad scale experience in JAVA related technologies and J2EE framework integrated with various Database technologies.
  • Experience in managing and analyzing massive dataset on multiple Hadoop frameworks like Cloudera and Hortonworks.
  • Experience and good understanding of Big Data real time streaming technologies like Apache Spark and Apache Kafka.
  • Expertise in data analytics and root cause analysis using HIVE, Pig, SparkSQL and DataFrames.
  • Working experience and good understanding of the various components of Apache Spark like RDD, DataFrames, Paired RDD and implementing various transformations and actions using Scala.
  • Experience in Data Ingestion from external data source like MySQL and Oracle into HDFS using Flume and Sqoop in various file formats like text, xml and json.
  • Worked on developing UDFs using JAVA for HIVE tables to provide customized functionalities that is not provided by the inbuilt Hadoop architecture.
  • Extensive working knowledge of Hadoop architecture and maintenance and monitoring of components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN and Map Reduce programming paradigm.
  • Good knowledge and experience in working with various No - Sql databases like HBase, Cassandra and Amazon RedShift.
  • Experience in working in Hadoop eco-system integrated to Cloud platform provided by AWS with several services like Amazon EC2 instances, S3 bucket and RedShift.
  • Experience in implementing and development of SDLC phases using Object Oriented concepts in JAVA, J2EE using frameworks like Spring and Hibernate and implementing database connections using JDBC.
  • Experience in working with web services like RESTful and SOAP in JAVA for data transfer.
  • Good hands on experience in configuration, deployment and management of enterprise applications on Application Servers like Web Sphere, JBoss and Web Servers like Apache Tomcat.
  • Expertise in working with various database like MySQL, Oracle and Microsoft SQLServer and data manipulation and analysis using SQL Queries and Stored Procedures.
  • Strong understanding of Data warehouse concepts like ETL, Star Schema, Snowflake, data modeling experience using Normalization, Business Process Analysis, Dimensional Data modeling, physical & logical data modeling.
  • Experience in developing applications using Waterfall and Agile methodologies and understand the roles and responsibilities being a part of the sprint.
  • Experience in Code version control using Git and maintain repositories as a best practice.

TECHNICAL SKILLS:

Programming Languages \ Databases\: C, Core Java, J2EE, Scala.\ Oracle, MySQL, SQL Server. \

Big Data Technologies\ Hadoop Paradigms \: Hadoop 2.x, HDFS, Map Reduce v2, Hive \ Map Reduce, YARN, In-memory computing, \ 1.2.1, Pig 0.12/0.14, HBase 0.98.0, Impala, \ High Availability, Real-time Streaming.\ Hue, Sqoop, Kafka 0.8.2, Oozie, Flume \ 1.5.2, Zookeeper 3.4.6, Spark 1.6, Cloudera \ CDH 4/CDH 5, and Hortonworks HDP \ 2.4/2.5/2.6. \

NoSQL Databases\ Scripting and Query Languages \: HBase and Cassandra.\ UNIX Shell scripting, SQL and PL/SQL.\

Operating Systems\ Other Tools\: Windows, UNIX, Linux distributions \ Eclipse, Tableau 10.1, JUnit, SBT, Maven.\ (Centos, Ubuntu).\

Amazon Web Services\: EC2 Instance, S3 Bucket, Amazon RedShift.\

PROFESSIONAL EXPERIENCE:

Confidential, Chicago, IL

Spark Developer

Responsibilities:

  • Experience in handling more than 2.5 TB of daily data from the Point of Sales.
  • Experience writing Scala code to process sales data on Spark framework 1.6 to get useful insights for business.
  • Performed data ingestion from various sources using Kafka data pipeline.
  • Experience in working in a data pipeline consisting of Kafka along with Apache Spark.
  • Used various levels of transformations and actions to perform various operations like geographical drilling of data on the fly to feed data into the spark system.
  • Used the various Spark functionalities for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, and Pair RDD's .
  • Created Spark Application to load data into Dynamic Partition Enabled Hive Table based on geographical locations that helped in faster processing the data.
  • Used Databricks XML plug-in to parse the incoming data in the XML format, and as well generate the required XML as output.
  • Worked for filtering data according to tags in xml file that refers to various events in a transaction at the point of sales.
  • Experience in writing Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
  • Optimized the data sets by creating dynamic partitioning in Hive.
  • Developed business specific custom UDF's in Hive.
  • Experience in performance tuning of Spark Applications for setting right Batch Interval time, correct level of Parallelism and memory tuning.
  • Responsible for maintaining the whole Big Data system on Amazon AWS RedShift.

Environment: Apache Hadoop 2.6.0, YARN, HDFS, Scala 2.10.5, Spark Core 1.6, Spark SQL, Java API, Hive 1.2.1, Kafka 0.8.2, Eclipse Neon, Hortonworks HDP 2.4.3, Zookeeper

Confidential, Omaha, NE

Hadoop Developer

Responsibilities:

  • Involved in the data acquisition, data pre-processing and data exploration of digital data of the business.
  • Worked with Data Scientists and QA team to understand, design and develop an end to end data flow requirements.
  • Used Sqoop to acquire the required data from the source to the HDFS using incremental approach.
  • Used HIVEQL and Pig scripts for data analysis.
  • Loaded the data into HIVE and Impala tables for further analysis.
  • Involved in creating HIVE tables and implement partitioning and bucketing for increasing the speed of query processing.
  • Performed various analysis and data trending to provide useful insights about the customers to the users.
  • Designed HIVE UDFs using JAVA to implement customized functions.
  • Used Sqoop for both import and export of the required data in and out of the Hadoop eco-system on demand.
  • Involved in the continuous enhancements and troubleshooting problems in the production.
  • Involved in designing Oozie workflow for scheduling and orchestrating the whole data flow of the system.
  • Involved in close monitoring of Hadoop log files to troubleshoot any issues.
  • Responsible for monitoring of Zookeeper for a smooth coordination among the various components.
  • Responsible for performing unit testing of deliverables for UAT.
  • Used Tableau for Data visualization to identify the role of various factors.
  • Involved in identifying KPIs.

Environment: HiveQL, MySQL, HIVE 1.2.0, Impala 2.1.0, Eclipse (Neon), Hadoop 2.x, Cloudera CDH 4, Oracle 11g, PL/SQL, Toad 9.6, Flume, PIG, Sqoop 1.4.6, Unix, Tableau 9.3.1, Zookeeper 3.4.8

Confidential - Omaha, NE

ETL Offload - Data Engineer

Responsibilities:

  • Experience in migrating data of size 15 - 20 TB from the old Mainframe System to the Hadoop Eco System.
  • Worked closely with the business analysts and System Engineers to convert the Business Requirements into Technical Requirements.
  • Used Sqoop 1.4.6 import techniques to fetch data from the old system.
  • Involved in designing Sqoop scripts and engineering the map programs to ensure efficient parallelism of data.
  • Involved in designing of HDFS storage to have efficient number of block replicas of data.
  • Involved in structuring HIVE tables to store the imported data extracted by Sqoop.
  • Experience in writing ETL jobs using HIVE QL.
  • Configured periodic incremental imports of data from Oracle into HDFS using Sqoop.
  • Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
  • Used JAVA to design customized MapReduce programs on demand.
  • Developed custom programs in JAVA from time to time make required changes in ETL.
  • Experience setting up asynchronous inter cluster Oozie workflows.
  • Followed simple Data Modeling techniques to enable dataflow easier and faster.
  • Designed and developed Oozie workflows for sequence flow of job execution.
  • Involved in Hadoop Cluster Administration, building cluster with different services, up gradation of HDP and Ambari Server.

Environment: JAVA 1. 7. 0, Oracle Server, HDP 2.4.2, MapReduce2, Hive 1.2.1, Pig 0.16.0, Sqoop 1.4.6, Oozie 4.2.0, Zookeeper 3.4.6, Apache Maven

Confidential

JAVA Developer

Responsibilities:

  • Responsible for gathering requirements from business owners and checking the feasibility of the requirements.
  • Worked with UI designers of the application and developed objects and JAVA entities accordingly.
  • Developing the java logic to fetch data from the front end and save data to database and vice-versa.
  • Involved in writing efficient stored procedure in MySQL.
  • Responsible for unit testing and preparing test case documents as per adherence to the company.
  • Involved in giving demos to business owner and take signoffs.
  • Responsible for maintaining code versions in Git.
  • Responsible for creating dashboards for senior management.
  • Responsible for learning VB and including the same in the application.
  • Involved in upgrading the existing system that was not that interactive from the patient point of view.
  • Involved in developing event interfaces where doctors can put up their availability.
  • Responsible for integrating attendance reports for doctors as well as patients required by senior management.

Environment: Java 6, JSF 2.1, Primefaces 3.3.1, HTML, CSS, JavaScript, RESTFUL webservices, MySQL

Confidential

JAVA Developer

Responsibilities:

  • Responsible for gathering requirements from business owners and checking the feasibility of the requirements.
  • Used Spring MVC and Hibernate Framework for data handling.
  • Implemented Hibernate API for database connectivity.
  • Developed Hibernate mapping configuration files to provide the relation between java objects and database tables.
  • Involved in UI design of the application and developing the screens as per the approved designs.
  • Developing the Java logic to fetch data from the front end and save data to database and vice-versa.
  • Wrote JUnit test cases to test the server-side Modules.
  • Responsible for writing efficient stored procedure in MySQL.
  • Responsible for unit testing and preparing test case documents as per adherence to the company
  • Responsible for giving demos to business owner and take signoffs.
  • Responsible for maintaining code versions in Git.

Environment: Java 6, Spring MVC, Hibernate, MySQL, Eclipse Helios, Git, Webservices like SOAP and RESTFUL, MySQL database

We'd love your feedback!