Spark/ Hadoop Developer Resume
Alpharetta, GA
SUMMARY:
- 8+ years of strong experience in Enterprise Application development using Java and Data Analytics&Data Engineering using Hadoop ecosystem.
- In depth understanding of Distributed Systems Architecture and Parallel Processing Frameworks.
- Strong experience using HDFS, MapReduce, Hive, Pig, Spark, Sqoop, Oozie, and HBase.
- Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance.
- Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features.
- Experience in developing Spark Applications using Spark RDD, Spark - SQL and Dataframe APIs.
- Worked with real-time data processing and streaming techniques using Spark streaming and Kafka .
- Experience in moving data into and out of the HDFS and Relational Database Systems (RDBMS) using Apache Sqoop .
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
- Significant experience writing custom UDF’s in Hive and custom Input Formats in MapReduce .
- Knowledge of job workflow management and coordinating tools like Oozie .
- Strong experience productionalizing end to end data pipelines on Hadoop platform.
- Experience working with NoSQL database technologies, including MongoDB, Cassandra and HBase.
- Extensive experience in importing/ exporting data to / from RDBMS and HDFS using Apache Sqoop.
- Expertise in working with HIVE data warehouse infrastructure-creating tables, data distribution by implementing Partitioning and Bucketing, developing and tuning the HQL queries.
- Involved in creating Hive tables, loading with data and writing Hive Adhoc queries that will run internally in MapReduce and TEZ.
- Strong understanding of real time streaming technologies Spark and Kafka.
- Strong understanding of Logical and Physical data base models and entity-relationship modeling.
- Replaced existing MR jobs and Hive scripts with Spark SQL & Spark data transformations for efficient data processing.
- Experience with Software development tools such as JIRA, Play, GIT.
- Good understanding of the Data modeling (Dimensional & Relational) concepts like Star-Schema Modeling, Snowflake Schema Modeling, Fact and Dimension tables.
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Strong understanding of Java Virtual Machines and multithreading process.
- Experience in writing complex SQL queries, creating reports and dashboards.
- Expertise in handling ETL tools like Informatica.
- Excellent analytical, communication and interpersonal skills.
- Experienced in using Agile methodologies including extreme programming, SCRUM and Test-Driven Development (TDD).
TECHNICAL SKILLS:
Operating Systems: Linux (Ubuntu, CentOS), Windows, Mac OS
Hadoop Ecosystem: Hadoop, MapReduce, Yarn, HDFS, Spark, HBase, Impala, Hive, Pig, Sqoop, Oozie, Flume, Zookeeper, Spark, NiFi
NOSQL Databases: HBase, Cassandra, MongoDB
Programming Languages: C, Scala, Core Java, J2EE (SERVLETS, JSP, JDBC, JAVA BEANS, EJB), C#, ASP.NET
Frameworks: MVC, Struts, Spring, Hibernate
Web Technologies: HTML, CSS, XML, JavaScript, Maven
Scripting Languages: Java Script, UNIX, Python, R Language
Databases: Oracle 11g, MS-Access, MySQL, SQL-Server 2000/2005/2008/2012, Teradata
SQL Server Tools: SQL Server Management Studio, Enterprise Manager, Query Analyzer, Profiler, Export & Import (DTS).
IDE: Eclipse, Visual Studio, IDLE, IntelliJ
Web Services: Restful, SOAP
Tools: Bugzilla, QuickTestPro (QTP) 9.2, Selenium, Quality Center, Test Link, TWS, SPSS, SAS, Documentum, Tableau, Mahout
Methodologies: Agile, UML, Design Patterns
PROFESSIONAL EXPERIENCE:
Confidential, Alpharetta, GA
Spark/ Hadoop Developer
Responsibilities:
- Developed multiple POCs using PySpark. Deployed jobs on YARN cluster and compared the performance of Spark SQL with Hive/ Impala and SQL/ Teradata.
- Developed PIG scripts for source data validation and transformation. Automated data loading into HDFS and PIG for pre-processing the data using Oozie.
- Created storage with Amazon S3 for storing data. Worked on transferring data from Kafka topic into AWS S3 storage.
- Designed and implemented an ETL framework using Java and PIG to load data from multiple sources into Hive and from Hive into Vertica.
- Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines.
- Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.
- Developed Spark Streaming jobs in Scala to consume data from Kafka Topics, made transformations on data and insert into HBase tables.
- Worked on converting Hive/SQL queries into Spark transformations using Spark RDDs, Python, and OOP with Python. Worked on developing and executing shell scripts to automate the jobs.
- Fine Tuned Spark Applications to optimize the performance of long running jobs.
- Hands on experience in Hadoop administration and support activities for installations and configuring Apache Big Data Tools and Hadoop clusters using Cloudera Manager.
- Handled Hadoop cluster installations in various environments such as Unix, Linux and Windows.
- Assisted in upgrading, configuration and maintenance of various Hadoop infrastructures like Pig, Hive, and HBase.
- Experienced in Hadoop Production support tasks by analyzing the Application and cluster logs.
- Created Hive tables, loaded with data, and wrote Hive queries to process the data. Created Partitions and used Bucketing on Hive tables and used required parameters to improve performance. Developed Pig and Hive UDFs as per business use-cases.
- Created data pipeline for different events of ingestion, aggregation and load consumer response data in AWS S3 bucket into Hive external tables in HDFS location to serve as feed for tableau dashboards.
- Worked on various data formats like AVRO, Sequence File, JSON, Map File, Parquet and XML.
- Worked extensively on AWS Components such as Airflow, Elastic Map Reduce (EMR), Athena, SnowFlake.
- Used Apache NiFi to automate data movement between different Hadoop components.
- Used NiFi to perform conversion of raw XML data into JSON, AVRO.
- Designed and published visually rich and intuitive Tableau dashboards and crystal reports for executive decision making.
- Experienced in working with SQL, T-SQL, PL/SQL scripts, views, indexes, stored procedures, and other components of database applications.
- Experienced in working with Hadoop from Cloudera Data Platform and running services through Cloudera manager.
- Used Agile Scrum methodology/ Scrum Alliance for development.
Environment: Hadoop, HDFS, Spark, AWS, Vertica, Scala, Kafka, MapReduce, YARN, Drill, Spark, Pig, Hive, t, Scala, Java, NiFi, HBase, MySQL, Kerberos, Maven
Confidential, NY
Spark/ Hadoop Developer
Responsibilities:
- Involved in gathering and analyzing business requirements.
- Worked with Hadoop 2.x version and Spark 2.x (Python and Scala).
- Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
- Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective & efficient Joins, Transformations and other during ingestion process itself.
- Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using Python and shell scripting.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle databases and HDFS.
- Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion into Hive tables for analysis.
- Imported data from various sources into Spark RDD for analysis.
- Developed Custom Input Formats in MapReduce jobs to handle custom file formats.
- Configured Oozie workflow to run multiple Hive and Pig jobs which run independently with time and data availability.
- Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in Hive like Structs and Maps.
- Developed PigLatin scripts and HiveQL queries for trend analysis and pattern recognition on user data.
- Extensively used Pig for cleansing and pre-process the data for analysis.
- Performed validation and standardization of raw data from XML and JSON files with Pig and MapReduce.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala.
- Developing traits and case classes etc. in Scala.
- Worked on partitioning and bucketing of large amounts of data to optimize Hive query performance.
- Used Informatica for HADOOP for loading data to and from HDFS and HIVE tables.
- Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to target Oracle Data Warehouse database.
- Involved in writing Spark applications for Data validation, cleansing, transformations and custom aggregations.
- Used different data formats (Text format and Avro format) while loading the data into HDFS.
- Worked on Sequence files, RC files, Map side joins, bucketing, partitioning for hive performance enhancement and storage improvement.
- Analyzed the Hadoop log files using Pig scripts to oversee the errors.
- Used GitHub for version control amongst a team of developers
- Worked with BI teams in generating the reports and designing ETL workflows on Tableau
- Scheduled daily workflow for extraction, processing and analysis of data using Oozie.
Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Informatica, HQL, Sqoop, Flume, Oozie, Java, Scala, Python, Shell scripting, Maven, Eclipse, Putty, GIT, Tableau
Confidential,
Spark/ AWS Developer
Responsibilities:
- Worked on the large-scale Hadoop Yarn cluster for distributed data processing and analysis using Spark, Hive, and HBase.
- Involved in creating a data lake by extracting customer data from various data sources to HDFS which include data from Excel, databases, and log data from servers.
- Developed Spark applications by using Scala and Python. Implemented Apache Spark for data processing from various streaming sources.
- Developed pre-processing job using Spark DataFrames to transform JSON documents into flat files.
- Worked with AWS cloud services like EC2, S3, RDS and VPC.
- Migrated existing on-premise application to AWS and used AWS services like EC2 and S3 to process and store small data sets.
- Experienced in maintaining the Hadoop cluster on AWS EMR.
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDDs.
- Worked on importing metadata into Hive and migrating existing tables and applications to work on Hive and AWS cloud.
- Wrote scripts to migrate data from SQL Server, MySQL databases to Amazon Redshift.
- Performed ETL operations using Pig, Hive to transform transactional data into de-normalized form.
- Created adhoc reports by gathering requirements from different teams.
- Utilized Pig and Hive user defined functions to analyze the complex data to find specific user behavior.
- Analyzed data using HiveQL to derive metrics like game duration, daily active users (DAU), weekly active users (WAU) etc.
- Implemented Hive generic UDFs to in corporate business logic into Hive queries.
- Worked on joining raw data with the reference data using Pig scripts.
- Worked along with the admin team to assist them in adding/ removing cluster nodes, cluster monitoring and trouble shooting.
- Exported data to relational databases using Sqoop for visualization and to generate reports.
- Created Machine Learning and statistical models like (SVM, CRF, HMM) to assess gamer performance.
- Identify and assess available machine learning and statistical analysis libraries (including regressors, classifiers, statistical tests, and clustering algorithms).
- Heavily worked with the end users to finalize the Dashboard layouts and also worked on fine detail formatting for a desired look and feel.
- Generated reports and dashboards were used for further Descriptive/predictive analytics.
- Supported in setting up QA environment and updating configurations for implementing code changes and deployment.
- Worked with teams at various locations to gather data from different sources.
- Communicated status of deliverables to users and SME and drove periodic review meetings.
Environment: Hadoop, AWS, Spark, MapReduce, HDFS, Hive, Pig, Java, SQL, Zookeeper, PL/ SQL, MySQL.
Confidential - Chicago, IL
Sr. Java Developer
Responsibilities:
- Involved in different phases of Software development life cycle (SDLC) which includes requirement gathering, design, analysis and code development.
- Extensively used Hibernate in data access layer to access and update information from the Database
- Implemented various design patterns in the project such as Business Delegate, Session Façade, Data Transfer Object, Data Access Object, Service Locator and Singleton.
- Involved in developing code for obtaining bean references in spring framework using Dependency Injection (DI) or Inversion of Control (IOC) using annotations.
- Primarily focused on the spring components such as Dispatcher Servlets, Controllers, Model and View Objects.
- Utilized Apache JSP, JSTL for presentation which invoke Java Beans to interact with controlling servlets.
- Created Junit test suites to test the hibernate DAOS, Rest controllers and upload sample Data to the backend database.
- Wrote complex SQL queries using PL/ SQL.
- Used ANT automated build scripts to compile and package the application and implemented Logic4j.
- Analyzed business requirements and verified them with functionality and features of NoSQL database HBase to determine optimal DB.
- Developed complex, usable, attractive and cross browser web interfaces that account for speed, file size, readability and accessibility.
- Implemented client side data validations using JavaScript and JSF validators.
- Used Angular JS to build customer forms connecting to backend.
- Used Java Messaging Service (JMS) for asynchronous exchange of important information.
- Developed applications using Eclipse as an IDE.
- Performed unit testing using JUNIT framework and used Struts test cases for testing action classes.
Environment: Java 1.5, JDBC, Spring, hibernate, Eclipse IDE, Design Patterns, XML, Oracle, PL/SQL Developer, Websphere, ANT, Clear case, JUnit, UML, web services, SOAP, XSLT,log4j, Tomcat 5.0
Confidential
Java Application Developer
Responsibilities:
- Analyzed and reviewed client requirements and design
- Worked on testing, debugging and troubleshooting all types of technical issues.
- Implemented MVC architecture using Spring Framework, coding involved writing Action Classes/Custom Tag Libraries, JSP
- Good knowledge in OOPS concepts, OOAD, UML
- Used JDBC for database connectivity and manipulation
- Used Eclipse for the Development, Testing and Debugging of the application.
- Working as java j2ee backend developer in creating the Maven web application project
- Used DOM Parser to parse the xml files.
- Log4j framework has been used for logging debug, info & error data.
- Used WinSCP to transfer file from local system to other system.
- Performed Test Driven Development (TDD) using JUnit.
- Used JProfiler for performance tuning
- Built the application using MAVEN and deployed using WebSphere Application server
- Gathered and collected information from various programs, analyzed time requirements and prepared documentation to change existing programs.
- Used SOAP for exchanging XML based messages.
- Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
- Developed Custom Tags to simplify the JSP code.
- Designed UI screens using JSP and HTML.
- Actively involved in designing and implementing Factory method, Singleton, MVC and Data Access Object design patterns.
Environment: JDK1.2, JavaScript, HTML, DHTML, XML, Struts, JSP, Servlet, JNDI, J2EE, Tomcat, Oracle, JSP, MAVEN, MICROSOFT VISIO.
Confidential
Java / J2EE Developer
Responsibilities:
- Member of application development team at Pennywise Solutions.
- Implemented the presentation layer with HTML, CSS and JavaScript
- Developed web components using JSP, Servlets and JDBC
- Implemented secured cookies using Servlets.
- Wrote complex SQL queries and stored procedures.
- Implemented Persistent layer using Hibernate API
- Implemented Transaction and session handling using Hibernate Utils
- Implemented Search queries using Hibernate Criteria interface.
- Provided support for loans reports for CB&T
- Designed and developed Loans reports for Evans bank using Jasper and iReport.
- Involved in fixing bugs and unit testing with test cases using Junit
- Resolved issues on outages for Loans reports.
- Maintained Jasper server on client server and resolved issues.
- Actively involved in system testing.
- Fine tuning SQL queries for maximum efficiency to improve the performance
- Designed Tables and indexes by following normalizations.
- Involved in Unit testing, Integration testing and User Acceptance testing.
- Utilizes Java and SQL day to day to debug and fix issues with client processes.
Environment: Java, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.