We provide IT Staff Augmentation Services!

Scala/spark Developer Resume

2.00/5 (Submit Your Rating)

Bloomington, MN

PROFESSIONAL SUMMARY:

  • Overall 8+ years of overall experience wif strong emphasis on Design, Development, Implementation, Testing and Deployment of Software Applications.
  • Over 4+ years of comprehensive IT experience in BigData and BigData Analytics, Hadoop, HDFS, MapReduce, YARN, Hadoop Ecosystem and Shell Scripting.
  • 5+ years of development experience using Java, J2EE, JSP and Servlets.
  • Highly capable for processing large sets of Structured, Semi - structured and Unstructured datasets and supporting BigData applications.
  • Hands on experience wif Hadoop Ecosystem components like Map Reduce (Processing), HDFS (Storage), YARN, Sqoop, Pig, Hive, HBase, Oozie, ZooKeeper, Spark, Spark SQL, Pyspark for data storage and analysis.
  • Experience and strong noledge on implementation ofSparkCore -SparkStreaming, Spark SQL, MLLib.
  • Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MY SQL, Oracle, Teradata and DB2 using Sqoop.
  • Experience in NoSQL databases like Mongo DB, HBase and Cassandra.
  • Experience in Apache Spark cluster and streams processing using Spark Streaming.
  • Expertise in moving large amounts of log, streaming event data and Transactional data using Flume.
  • Experience in developing MapReduce jobs in Java for data cleaning and preprocessing.
  • Expertise in writing Pig Latin, Hive Scripts and extended their functionality using User Defined Functions (UDF's).
  • Expertise in handling arrangement of data wifin certain limits (Data Layout's) using Partitions and Bucketing in Hive.
  • Expertise in preparing Interactive Data Visualization's using Tableau Software from different sources.
  • Hands on experience in developing workflows execute MapReduce, Sqoop, Pig, Hive and Shell Scripts using Oozie.
  • Experience working wif Cloudera Hue Interface and Impala.
  • Hands on experience developing Solr Indexes using MapReduce Indexer Tool.
  • Expertise in Object-Oriented Analysis and Design (OOAD) like UML and use of various design patterns.
  • Experience in Java, JSP, Servlets, EJB, Web Logic, Web Sphere, Hibernate, Spring, JBoss, JDBC, RMI, Java Script, Ajax, JQuery, XML and HTML.
  • Fluent wif teh core Java concepts like me/O, Multi-Threading, Exceptions, Reg Ex, Data Structures and Serialization.
  • Performed Unit Testing using Junit Testing Framework and Log4J to monitor teh error logs.
  • Experience in process Improvement, Normalization/De-normalization, Data extraction, cleansing and Manipulation.
  • Converting requirement specification, Source system understanding into Conceptual, Logical and Physical Data Model, Data flow (DFD).
  • Expertise in working wif Transactional Databases like Oracle, SQL server, My SQL, and Db2.
  • Expertise in developing SQL queries, Stored Procedures and excellent development experience wif Agile Methodology.
  • Highly skilled in System Analysis, ER/Dimensional Data modeling, Database design and implementing RDBMS specific features..
  • Excellent leadership, interpersonal, problem solving and time management skills.
  • Excellent communication skills both Written (documentation) and Verbal (presentation).

TECHNICAL SKILLS:

Technology: Hadoop Ecosystem/ J2SE/ J2EE/ Oracle.

Languages: Hadoop, HDFS, MapReduce, Hive, Pig, HBase, Spark, Scala, Impala, kafka, Hue, Sqoop, Oozie, Flume, Zookeeper, Cassandra, Cloudera CDH5, Python, PySpark, Solrand Horton works.

DBMS/Databases: Oracle, MySQL, SQL Server, DB2, Mongo DB, Teradata, HBase, Cassandra. .

Programming Languages: C, C++, JSE, XML, JSP/Servlets, Struts, Spring, HTML, JavaScript, jQuery, Web services.

Big Data Ecosystem: HDFS, Map Reduce, Oozie, Hive, Pig, Sqoop, Flume, Zookeeper and Hbase, Storm,Kafka, Spark, Scala.

Methodologies: Agile,WaterFall.

NOSQL Databases: Cassandra, MongoDB, HBase.

Version Control Tools: SVN, CVS, VSS, PVCS.

Reporting Tools: Crystal Reports, SQL Server Reporting Services and Data Reports, Business Intelligence and Reporting Tool (BIRT).

PROFESSIONAL EXPERIENCE:

Confidential, Bloomington, MN

Scala/Spark Developer

Responsibilities:

  • Performed performance tuning and troubleshooting of MapReduce jobs by analyzing and reviewing Hadooplog files.
  • Involved in converting Hive/SQL queries into Sparktransformations using SparkRDD, Scala and Python.
  • Involved in complete Big Data flow of teh application starting from data ingestion upstream to HDFS, processing teh data in HDFS and analyzing teh data.
  • Knowledge on handling Hive queries using Spark SQL dat integrate wif Spark environment implemented in Scala.
  • Used Spark Streaming API wif Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check - pointing, and SBT.
  • Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse
  • Creating Hive tables to import large data sets from various relational databases using Sqoop and export teh analyzed data back for visualization and report generation by teh BI team.
  • Designed teh code promotion flow from teh Development environment to teh Production via Quality Analysis Environment Real time streaming of data using Sparkwif Kafka.
  • Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs.
  • Developed a process for teh Batch ingestion of CSV Files, Sqoop from different sources and also generating views on teh data source using Shell Scripting and Python.
  • Integrated a shell script to create Collections/morphline, SolrIndexes on top of table directories using MapReduce Indexer Tool wifin Batch Ingestion Framework.
  • Implemented partitioning, dynamic partitions and buckets in HIVE.
  • Developed Hive Scripts to create teh views and apply transformation logic in teh Target Database.
  • Involved in teh design of Data Mart and Data Lake to provide faster insight into teh Data.
  • Applied Markov Model and Bayes Theory to predict teh probability of buying one product given bought another product in PySpark.
  • Experienced in using Kafka as a data pipeline between JMS (Producer) and Spark Streaming Application (Consumer) .
  • Involved in teh development of Spark Streaming application for one of teh data source using Scala, Spark by applying teh transformations.
  • Developed a script in Scala to read all teh Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
  • Created DataModel templates and sessions; resolved Change Requests (CRs) and updated Datamodels.
  • Configured Zookeeper for Cluster co-ordination services.
  • Developed a unit test script to read a Parquet file for testing PySpark on teh cluster.
  • Involved in exploration of new technologies like AWS, Apache Flink, and Apache NIFIetc which can increase teh business value.

Environment:Hadoop, HDFS, Map Reduce, Hive, HBase, Zookeeper, Impala, Java(jdk1.6), Cloudera, Oracle, SQL Server, UNIX Shell Scripting, Flume, Oozie, Scala, Spark, Sqoop, Python, kafka, PySpark.

Confidential, Santa Clara, CA

Sr. Hadoop Developer

Responsibilities:

  • Responsible for Writing MapReduce jobs to perform operations like copying data on HDFS and defining job flows on EC2 server, load and transform large sets of structured, semi - structured and unstructured data..
  • Developed a process for Sqooping data from multiple sources like SQL Server, Oracle and Teradata.
  • Responsible for creation of mapping document from source fields to destination fields mapping.
  • Developed a shell script to create staging, landing tables wif teh same schema like teh source and generate teh properties which are used by Oozie jobs.
  • Completed data extraction, aggregation and analysis in HDFS by using PySparkand store teh data needed to Hive database.
  • Worked wif NoSQL databases like Hbase in creating Hbase tables to load large sets of semi structured data coming from various sources.
  • Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
  • Configured Spark streaming to receive real time data from teh Kafka and store teh stream data to HDFS using Scala.
  • Responsible for developing Python wrapper scripts which will extract specific date range using Sqoop by passing custom properties required for teh workflow.
  • Involved in enhancing teh speed performance using Apache Spark.
  • Developed Sparkcode using scala and Spark -SQL for batch processing of data.
  • Involved in parsing JSON data into structured format and loading into HDFS/Hive using Spark streaming.
  • Developed Hive scripts for performing transformation logic and also loading teh data from staging zone to final landing zone.
  • Worked on Parquet File format to get a better storage and performance for publish tables.
  • Involved in loading transactional data into HDFS using Flume for Fraud Analytics.
  • Developed Python utility to validate HDFS tables wif source tables.
  • Designed and developed UDF S to extend teh functionality in both PIG and HIVE.
  • Import and Export of data using Sqoop between MySQL to HDFS on regular basis.
  • Responsible for developing multiple Kafka Producers and Consumers from scratch as per teh software requirement specifications..
  • Exploring wif teh Sparkimproving teh performance and optimization of teh existing algorithms in Hadoop..
  • Having experience in Abinitio Data Quality Suite which is an ETL application and used for drag and drop component and handling large amount of Data.
  • Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
  • Migrating teh needed data from Oracle, MySQL in to HDFS using Sqoop and importing various formats of flat files in to HDFS.

Environment:Hadoop, HDFS, Map Reduce, Hive, HBase, Kafka, Zookeeper, Oozie, Impala, Java(jdk1.6), Cloudera, Oracle, Teradata SQL Server, UNIX Shell Scripting, Flume, Scala, Spark, Sqoop, Python.

Confidential, El Segando, CA

Sr. Hadoop Developer

Responsibilities:

  • Responsible for Managing, Analyzing and Transforming petabyte s of data and also quick validation check on FTP file arrival from S3 Bucket to HDFS.
  • Responsible for analyzing large data sets and derive customer usage patterns by developing new MapReduce programs.
  • Experienced in creation of Hive tables and loading data incrementally into teh tables using Dynamic Partitioning and Worked on Avro Files, JSON Records.
  • Experienced in using Pig for data cleansing and developed Pig Latin scripts to extract teh data from web server output files to load into HDFS.
  • Worked on Hive by creating external and internal tables, loading it wif data and writing Hive queries.
  • Involved in development and usage of UDTF s and UDAF s for decoding Log Record Fields and Conversion s, Generating Minute Buckets for teh specified Time Interval s and JSON Field Extractor.
  • Developed Pig and Hive UDF's to analyze teh complex data to find specific user behavior.
  • Responsible for Debug, Optimization of Hive Scripts and also implementing Deduplication Logic in Hive using a Rank Key Function (UDF).
  • Experienced in writing Hive Validation Scripts which are used in validation framework (for daily analysis through graphs and presented to business users).
  • Developed workflow in Oozie to automate teh tasks of loading data into HDFS and pre - processing wif Pig and Hive.
  • Involved for Cassandra Database Schema design.
  • Using BULK LOAD Utility data pushed to Cassandra databases.
  • Responsible for creating Dashboards on Tableau Server.
  • Generated reports for hive tables in different scenarios using Tableau
  • Responsible for Scheduling using Active Batchjobs and Cron jobs.
  • Experienced in Jar builds dat can be triggered by commits to Github using Jenkins.
  • Exploring new tools for data tagging like Tealium (POC Report).
  • Actively updated teh upper management wif daily updates on teh progress of project dat include teh classification levels dat were achieved on teh data.

Environment:Hadoop, Map Reduce, HDFS, Pig, Hive, HBase, Zookeeper, Oozie, Impala, Cassandra, Java(jdk1.6), Cloudera, Oracle 11g/10g, Windows NT,UNIX Shell Scripting, Tableau, Tealium.

Confidential, Cincinnati, OH

Sr. Java Developer

Responsibilities:

  • Responsible for understanding teh scope of teh project and requirements gathering
  • Used MapReduce to Index teh large amount of data to easily access specific records.
  • Supported MapReduce Programs which are running on teh cluster.
  • Developed MapReduce programs to perform data filtering for unstructured data.
  • Designed teh application by implementing Struts Framework based on MVC Architecture.
  • Designed and developed teh front end using JSP, HTML and JavaScript and JQuery.
  • Developed framework for data processing using Design patterns, Java, XML.
  • Implemented J2EE standards, MVC2 architecture using Struts Framework.
  • Implementing Servlets, JSP and Ajax to design teh user interface.
  • Used JSP, Java Script, HTML5, and CSS for manipulating, validating, customizing, error messages to teh User Interface.
  • Used teh light weight container of teh Spring Framework to provide architectural flexibility for Inversion of Controller (IOC).
  • Used SpringIOC for dependency injection to Hibernate and Spring Frameworks.
  • Designed and developed Session beans to implement teh Business logic.
  • Developed EJB components dat are deployed on Web logic Application Server.
  • Written unit tests using Junit Framework and Logging is done using Log4J Framework.
  • Used Html, CSS, JavaScript and JQuery to develop front end pages.
  • Designed and developed various configuration files for Hibernate mappings.
  • Designed and Developed SQL queries and Stored Procedures.
  • Used XML, XSLT, XPATH to extract data from Web Services output XML
  • Extensively used JavaScript, JQuery and AJAX for client - side validation.
  • Used ANT scripts to fetch, build, and deploy application to development environment.
  • Developed Web Services for sending and getting data from different applications using SOAP messages.
  • Actively involved in code reviews and bug fixing.
  • Applied CSS (Cascading style Sheets) for entire site for standardization of teh site.
  • Offshore co-ordination and User acceptance testing support.

Environment:Java 5.0, Struts, Spring 2.0, Hibernate 3.2, WebLogic 7.0, Eclipse 3.3, Oracle 10g, Junit 4.2,Maven, Windows XP,J2EE, JSP, JDBC, Hibernate, spring, HTML, XMLCSS, JavaScript and JQuery.

Confidential

Software Programmer.

Responsibilities:

  • Involved in teh analysis & design of teh application using Rational Rose.
  • Developed teh various action classes to handle teh requests and responses.
  • Designed and created Java Objects, JSP pages, JSF, JavaBeans and Servlets to achieve various business functionalities and created validation methods using JavaScript and Backing Beans.
  • Involved in writing client side validations using JavaScript, CSS.
  • Involved in teh design of teh Referential Data Service module to interface wif various databases using JDBC.
  • Used Hibernate framework to persist teh employee work hours to teh database.
  • Developed classes and interface wif underlying web services layer.
  • Prepared documentation and participated in preparing user's manual for teh application.
  • Prepared Use Cases, Business Process Models and Data flow diagrams, User Interface models.
  • Gathered & analyzed requirements for EAuto, designed process flow diagrams.
  • Defined business processes related to teh project and provided technical direction to development workgroup.
  • Analyzed teh legacy and teh Financial Data Warehouse.
  • Participated in Data base design sessions, Database normalization meetings.
  • Managed Change Request Management and Defect Management.
  • Managed UAT testing and developed test strategies, test plans, reviewed QA test plans for appropriate test coverage.
  • Involved in Developing JSP's, action classes, form beans, response beans, EJB's.
  • Extensively used XML to code configuration files.
  • Developed PL/SQL stored procedures, triggers.
  • Performed functional, integration, system and validation testing.

Environment:Java, J2EE, JSP, JCL, DB2, Struts, SQL, PL/DSQL, Eclipse, Oracle, Windows XP, HTML, CSS, JavaScript, and XML.

We'd love your feedback!