Big Data / Spark Developer Resume
King Of Prussia, PA
SUMMARY:
- Over 6+ years of strong experience in the IT industry that includes 3+ years as a Hadoop & Spark Developer. A motivated team player that focusing on problem solving with strong analytical skills and good communication skills.
- Strong expertise in Hadoop eco system components HDFS, Map Reduce, Yarn, HBase, Pig, Sqoop, Spark, Hive and AWS with good knowledge on big data architecture, working principles, development, validation and data cleansing.
- Expert in Big Data distributions over Hortonworks.
- Strong experience in data migration between external database and HDFS using sqoop.
- Highly experienced in Single - Node and Multi-Node Cluster Configurations.
- Proficient in data extraction, transformation and loading (ETL) with Hive, Pig.
- Experienced in Oozie for Big Data job workflow scheduling and monitoring.
- Hands on experience in NoSQL databases like Cassandra, MongoDB.
- Strong experience in RDBMS technologies like Oracle, MySQL.
- Expert in SQL, Good working experience in writing SQL Joins, Nested Queries, Unions.
- Good understanding of SQL, ETL and Data Warehousing technologies.
- Involved in all phases of data warehouse development, ETL implementation to support both new and existing applications.
- Experience with Unix/Linux and Windows Operating Systems.
- Strong programming skills in designing and implementation of applications using Core Java, J2EE, JDBC, JSP, HTML, Spring Framework, Spring batch framework, Spring AOP, Struts, JavaScript, Servlets.
- Knowledge of java virtual machines (JVM) and multithreaded processing.
- Developed Web-Services module for integration using SOAP and REST.
- Sound knowledge on designing data warehousing applications with using Tools like Teradata, Oracle and SQL Server.
- Strong understanding of Agile Scrum and Waterfall SDLC methodologies.
- Strong communication, collaboration & team building skills with proficiency at grasping new technical concepts quickly and utilizing them in a productive manner.
- Adept in analyzing information system needs, evaluating end-user requirements, custom designing solutions and troubleshooting information systems.
TECHNICAL SKILLS:
Hadoop Ecosystem: HDFS, Map Reduce, Yarn, Hive, Pig, Sqoop, Agile Scrum, Waterfall SDLCSpark, Kafka, AWS
Programming Language Specialties: Java, Python, Scala, JavaScript, Spring, Hibernate
Database: Oracle, MySQL, MongoDB, Cassandra\ Windows, Linux
PROFESSIONAL EXPERIENCE:
Confidential, King of Prussia, PA
Big Data / Spark Developer
Responsibilities:
- Loaded Data into HDFS fully using Sqoop 1.4.6 and Incrementally using Flume 1.5.0 from web server, RDBMS and Data API's.
- Developed workflow with Oozie 3.0 to automate the tasks of loading the data into HDFS and pre-processing with Pig 0.16.0 on monthly basis.
- Wrote SparkSQL context to load data from Hive 2.0.0 tables into RDD's for performing complex queries and analytics on data present in data lake.
- Improved performance of the existing algorithms in Hadoop using SparkContext, SparkSQL, DataFrames, RDD's.
- Widely used Spark transformations for data wrangling and real-time data ingestion of various file formats to RDD's using Spark Data frames.
- Involved in converting Cassandra/Hive/SQL queries into Spark Transformations using RDD's and Scala.
- Transferred data from Hive tables into Cassandra 3.0 for real-time analysis.
- Created both managed and external tables in Hive, using dynamic partitioning, bucketing for performance optimization.
- Exported data from Impala to Tableau 10.0 reporting tool.
- Monitored the Hadoop cluster continuously using Cloudera manager and written the shell scripts for automation of mails to Business team.
- Used Spark 2.0.0 over Hortonworks and Hadoop YARN 2.5.2 to analyze data in Hive.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Assisted functional specification to meet the customer needs for development.
- Followed Agile Methodology and SCRUM meetings to track, optimize and tailored features to customer needs.
Environment: Hadoop, Java, UNIX, HDFS, Map Reduce, Flume, Hive, Spark, Sqoop, Cassandra, Oozie, Pig, Impala, Tableau
Confidential, Red Bank, NJ
Big Data / Hadoop Developer
Responsibilities:
- Designed, built and maintained Big Data workflows/pipelines to process billions of records into and out of the data lake and Identity Graph.
- Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
- Created and maintained Hive warehouse for Hive analysis.
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Assembled large, complex data sets that met business requirements by using Cassandra.
- Migrated the data user data into the HDFS on a weekly basis using Apache Scoop 1.4.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Run clustering and user recommendation agents on the weblogs and profiles of the users to generate the interest matrix.
- Built continuous integration and automated deployment by Bash Script in Linux environment.
- Generated test cases for the new Map Reduce jobs.
- Prepared the data for consumption by formatting it for upload to the UDB system.
- Engaged in application design and data modeling discussions.
- Participated in capacity monitoring and planning.
- Performed unit test cases with high code coverage and fine-tuned application performance.
- Worked in an onsite-offshore environment.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Spark, Sqoop, Cassandra
Confidential, New York, NY
Big Data / Hadoop Developer
Responsibilities:
- Implemented and tested data processing pipelines (Hadoop 2.5.2 and Spark 2.0) and data mining algorithms on AWS clusters.
- Contributed to building, and deploying high-performance production infrastructure to support data warehousing, real-time ETL, and batch big-data processing with Kafka and Spark 2.0
- Used Apache Scoop to transfer the user data into the HDFS.
- Run various Hive queries on the data dumps and generate aggregated datasets for downstream systems for further analysis.
- Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map reduce way.
- Experienced in managing and reviewing Hadoop log files
- Collaborated with data scientists to design and develop processes to further business unit and company-wide data science initiatives on a common data platform.
- Examined data quality. Detected data/analytics quality issues and implement fixes and data audits to prevent/capture such issues.
- Translated business analytic needs into enterprise data models and ETL processes to populate them.
Environment: Apache Hadoop, Kafka, Spark, Scoop, HDFS, Hive, Map Reduce, MySQL
Confidential, Parsippany, NJ
Big Data / Hadoop Developer
Responsibilities:
- Created data model for data management and processing.
- Wrote Hive queries to categorize data for health information.
- Designed and created Hive external tables for data management from relational database.
- Used HiveQL scripts to create, load, and query tables in a Hive.
- Transferred the health data from SQL into HDFS using Sqoop and imported various formats of flat files into HDFS.
- Supported Map Reduce Programs those are running on the cluster
- Assisted in system integrity maintenance of components (HDFS, MR, HBase, and Hive).
- Monitored System health logs and respond accordingly to any warning or failure conditions.
- Presented data and dataflow using Talend for reusability.
Environment: Apache Hadoop, HDFS, Hive, Map Reduce, Java, Sqoop, MySQL, Talend
Confidential, NJ
Java Developer
Responsibilities:
- Worked with scrum master, team members, QA teams to clarify on requirements and to develop realistic development plans and contribute to successful delivery of the Project.
- Extensively used Core Java, Servlets, JSP and XML.
- Designed and developed user interface screens using Struts 1.2, JSP and Servlets.
- Implemented and integrated Spring MVC with Struts for developing UI screens.
- Involved in the implementation of Spring ORM with Hibernate and mapped with MySQL database using Hibernate Annotations.
- Performed unit testing for all the components using JUnit.
- Deployed the application with Tomcat Application server
- Developed application using Git version control and used Eclipse IDE for development.
Environment: Java, Servlets, JSP, Struts, Spring MVC, Spring Core, Spring ORM, Hibernate, MySQL, JUnit, RESTful web services, Git, Apache Tomcat, Eclipse IDE
Confidential
JAVA Developer
Responsibilities:
- Involved in the complete software development life cycle of the application from requirement analysis to testing.
- Developed the modules based on Spring MVC Architecture.
- Worked with Spring, RestfulWebServices to interact with Objects created ORM tools.
- Created Business Logic using Servlets, Session beans.
- Developed RESTful Web services using Java, SpringBoot
- Wrote complex SQL queries using joins, sub queries and correlated sub queries to retrieve data from the database.
- Prepared the Functional, Design and Test case specifications.
- Performed unit testing, system testing and integration testing
- Developed the user interface using JavaScript, JSP, HTML, and CSS for interactive cross browser functionality and complex user interface.
- Developed Unit Test Cases. Used JUnit for unit testing of the application.
- Provided Technical support for production environments resolving the issues, analyzing the defects. Resolved more priority defects as per the schedule.
Environment: Java, JavaSript, Spring, HTML, CSS, SQL, JUnit
