Spark Developer Resume
Minneapolis, MinnesotA
SUMMARY:
- 7 years of work experience in IT, which includes experience in Development and Implementation of Hadoop, Data warehousing solutions and Java.
- Experience working with Hadoop ecosystem using components like HDFS, MapReduce, Hive, HBase, Sqoop, Impala in Cloudera distribution and good knowledge on Hortonworks.
- Hands on experience in writing Sqoop scripts to import data from multiple RDBMS to HDFS.
- Experience in Hive partitioning, bucketing and perform joins on Hive tables and implement Hive SerDes.
- Good Knowledge in writing Spark Applications in PySpark and Scala using Datafames.
- Hands on Experience in designing and developing applications In Spark using Scala and Pyspark to compare the performance of Spark with Hive and SQL/Oracle.
- Software developer in core Java Application Development, Client/Server Applications, and Internet/Intranet based database applications and developing, testing and implementing application environment using J2EE, JDBC, JSP, Servlets, Web Services, Oracle, PL/SQL and Relational Databases.
- Experience developing Kafka producers and Kafka Consumers for streaming millions of events per second on streaming data
- Experience in manipulating/analyzing large datasets and finding patterns and insights within structured and unstructured data.
- Having working experience with Building RESTful web services, and RESTful API
- Solid design skills using Java Design Patterns and Unified Modeling Language UML.
- Strong understanding of real time streaming technologies Spark and Kafka.
- Strong understanding of Logical and Physical data base models and entity - relationship modeling.
- Replaced existing MR jobs and Hive scripts with Spark SQL, Spark data transformations for efficient data processing.
- Strong understanding of Java Virtual Machines and multithreading process.
- Experience in writing complex SQL queries, creating reports and dashboards.
- Excellent analytical, communication and interpersonal skills.
- Possess excellent communication, interpersonal and analytical skills along with positive attitude.
TECHNICAL SKILLS:
Programming/Scripting Languages: Scala, PySpark, Core Java, Python, SQL
Big Data: Hadoop, MapReduce, HDFS, Hive, sqoop, Spark, Kinesis, Kafka
Other tools: Microsoft Office tools, VSTS, VM ware, Git NoSQL, Oracle, MYSQL Apache Cassandra, Hbase
Big data Eco System: HDFS, Oozie, Zookeeper, Spark SQL, Spark streaming, Hue, Ambari, Impala.
File Formats: Txt, XML, JSON, Avro, Parquet, ORC
Cloud Computing: AWS
Visualization and Reporting Tools: Tableau, Microsoft Power BI
PROFESSIONAL EXPERIENCE:
Confidential - Minneapolis, Minnesota
Spark Developer
Responsibilities:
- Working as a Big Data Engineer on Hortonworks distribution. Responsible for Data Ingestion, Data Cleansing, Data Standardization and Data Transformation.
- Working with Hadoop 2.x version and Spark 2.x (Python and Scala).
- Involved in extracting data from various data sources into Hadoop HDFS. This included data from SFTP server, GCS and AWS buckets.
- Worked on creating Hive managed and external tables based on the requirement.
- Implemented Partitioning and Bucketing on Hive tables for better performance.
- Used Spark-SQL to process the data and to run on Spark engine.
- Worked on Spark for improving performance and optimization of existing algorithms in Hadoop using Spark-SQL and Scala.
- Worked on Google cloud Big Query to execute the queries and analyze the data quickly.
- Worked on various file formats like Parquet, Json and ORC.
- Developed end to end ETL pipeline using Spark-SQL, Scala on Spark engine.
- Worked with external vendors or partners to onboard external data into Confidential GCS buckets.
- Worked on Oozie to develop workflows to automate ETL data pipeline.
- Worked visualization tools like Google visual studio and internal tool like Domo.
- Used Spark for interactive queries, processing of streaming data and integration with NoSQL database for huge volume of data.
- Experienced in handling large datasets using Partitions, Spark in-memory capabilities, Broadcasts in Spark, Effective efficient Joins, Transformations and other during ingestion process itself.
- Developed custom ETL solutions, batch processing and real-time data ingestion pipeline to move data in and out of Hadoop using Pyspark and shell scripting.
- Explored with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frame.
- Worked with Sqoop import and export functionalities to handle large data set transfer between Oracle databases and HDFS.
- Developed Spark jobs to clean data obtained from various feeds to make it suitable for ingestion into Hive tables for analysis.
- Imported data from various sources into Spark RDD for analysis.
- Configured Oozie workflow to run multiple Hive jobs which run independently with time and data availability.
- Utilized Hive tables and HQL queries for daily and weekly reports. Worked on complex data types in Hive like Structs and Maps.
- Imported data from AWS S3 into Spark RDD, performed transformations and actions on RDDs.
- Created Cassandra tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Assisted with data capacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensure data quality and availability.
- Designing ETL processes using Informatica to load data from Flat Files, Oracle and Excel files to Confidential Oracle Data Warehouse database.
Environment: Spark, Spark SQL, Hive, Oozie, Sqoop, Flume, Java, Scala, PySpark, Shell scripting, Tableau.
Confidential - Atlanta, Georgia
Hadoop developer
Responsibilities:
- Involved in review of functional and non-functional requirements.
- Utilized SQOOP, Kafka, Flume and Hadoop File System API’s for implementing data ingestion pipelines.
- Worked on real time streaming, performed transformations on the data using Kafka and Spark Streaming.
- Migrated existing on-premise application to AWS and used AWS services like EC2 and S3 to process and store small data sets.
- Experienced in maintaining the Hadoop cluster on AWS EMR.
- Worked on importing metadata into Hive and migrating existing tables and applications to work on Hive and AWS cloud.
- Involved in managing and reviewing Hadoop log files. Designed and implemented an ETL framework using Java.
- Worked on real time streaming, performed transformations on the data using Kafka and flume.
- Developed Spark Streaming jobs in Scala to consume data from Kafka Topics, made transformations on data and insert into HBase tables.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developing Scripts and Batch Job to schedule various Hadoop Program.
- Written Hive queries for data analysis to meet the business requirements.
- Creating Hive tables and working on them using Hive QL.
- Experienced in defining job flows.
- Got good experience with MYSQL database Cassandra.
- Involved in creating Hive tables loading with data and writing hive queries which will run internally in map reduce way.
- Developed a custom Filesystem plug in for Hadoop so it can access files on Data Platform.
- Designed and implemented MapReduce-based large-scale parallel relation-learning system.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, Hadoop distribution of Horton Works, Cloudera, HBase, Linux, XML, MySQL, Hadoop, HDFS, ETL, Kafka, YARN, Drill, Hive, Cassandra
Confidential
Hadoop Developer
Responsibilities:
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System.
- Performed ETL operations using Hive to transform transactional data into de-normalized form.
- Created adhoc reports by gathering requirements from different teams.
- Utilized Hive user defined functions to analyze the complex data to find specific user behavior.
- Analyzed data using HiveQL to derive metrics like game duration, daily active users (DAU), weekly active users (WAU) etc.
- Implemented Hive generic UDFs to in corporate business logic into Hive queries.
- Worked along with the admin team to assist them in adding/ removing cluster nodes, cluster monitoring and trouble shooting.
- Exported data to relational databases using Sqoop for visualization and to generate reports.
- Created Machine Learning and statistical models like (SVM, CRF, HMM) to assess gamer performance.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files.
- Tested raw data and executed performance scripts.
- Shared responsibility for administration of Hadoop and Hive.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, PL/ SQL, MySQL. Sqoop, Linux XML MySQL.
Confidential
Core Java/J2EE Developer
Responsibilities:
- Troubleshoot various software issues using debugging process and coding techniques.
- Provide high-level customer support to remote clients using a support e-ticketing system.
- Perform system administration for hosting server and client software.
- Developed RESTful web services, and RESTful API.
- Developed screens using Java, HTML, DHTML, CSS, JSP and JavaScript.
- Designed Database for the application.
- Implemented all validations and done testing.
- Implemented and managed SQL database for use in background for security and internal proprietary processes.
- Diagnose and correct errors within Java/HTML/PHP code to allow for connection and utilization of proprietary applications.
- End user support and administrative functions to include password and account management.
- Developed PL/SQL View function in Oracle 9i database for get available date module.
- Used Quartz schedulers to run the jobs in a sequential with in the given time
- Used JSP and JSTL Tag Libraries for developing User Interface components.
Environment: JDK 5.0, JavaScript, HTML, DHTML, XML, Struts, JSP, Servlet, JNDI, J2EE, Tomcat, Oracle, JSP, RESTful API.
Confidential
Java/J2EE Developer
Responsibilities:
- Analyzed and reviewed client requirements and design
- Worked on testing, debugging and troubleshooting all types of technical issues.
- Good knowledge in OOPS concepts
- Used JDBC for database connectivity and manipulation
- Used Eclipse for the Development, Testing and Debugging of the application.
- Working as java j2ee backend developer in creating the Maven web application project
- Used DOM Parser to parse the xml files.
- Log4j framework has been used for logging debug, info & error data.
- Used WinSCP to transfer file from local system to other system.
- Performed Test Driven Development (TDD) using JUnit.
- Used JProfiler for performance tuning
- Built the application using MAVEN and deployed using WebSphere Application server.
- Gathered and collected information from various programs, analyzed time requirements and prepared documentation to change existing programs.
- Used SOAP for exchanging XML based messages.
- Used Microsoft VISIO for developing Use Case Diagrams, Sequence Diagrams and Class Diagrams in the design phase.
- Developed Custom Tags to simplify the JSP code.
- Designed UI screens using JSP and HTML.
Environment: Java, HTML, Servlets, JSP, Hibernate, Junit Testing, Oracle DB, SQL, Jasper Reports, iReport, Maven, Jenkins.