We provide IT Staff Augmentation Services!

Big Data/ Scala Developer Resume

Memphis, TN

SUMMARY

  • Over 5+ years of IT experience in Analysis, Design, Development and in Scala, Spark, Hadoop and HDFS environment and experience in JAVA, J2EE.
  • Experienced in developing and Implementing MapReduce programs using Hadoop to work as per teh requirement.
  • Excellent experience on Scala, Apache Spark, Spark Streaming, Pattern Matching and Map - Reducing.
  • Developed ETL test scripts based on technical specifications/Data design documents and source to target mappings.
  • Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
  • Wrote scripts to automate data load and performed data transformation operations.
  • Experienced in working with different data sources like Flat files, Spreadsheet files, log files and Databases.
  • Experienced in working with flume to load teh log data from multiple sources directly into HDFS.
  • Excellent experience in Apache Hadoop ecosystem components like Hadoop Distributing File System (HDFS), Map Reduce, Sqoop, Apache Spark and Scala.
  • Wrote scripts and indexing strategy for a migration from SQL Server and MySQL databases
  • Extensive experience working in Oracle, DB2, SQL Server and MySQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
  • Developed Spark scripts by using Scala shell commands as per teh requirement. Processing teh schema oriented and non-schema oriented data using Scala and Spark.
  • Experience with Oozie Workflow Engine in running workflow jobs with actions dat run Hadoop Map-Reduce and Pig jobs.
  • Experience with MapReduce, Pig, Programming Model, Installation and Configuration of Hadoop, HBase, Hive, Pig, Sqoop and Flume using Linux commands.
  • Experience in managing and reviewing Hadoop Log files using FLUME and Kafka and also developed teh Pig UDF's and Hive UDF's to pre-process teh data for analysis.
  • Experience with NOSQL databases like HBASE and Cassandra. Involved in Support for teh weekly Production maintenance window and new SOLR deployment process for teh new environments.
  • Experience in scripting using UNIX Shell script. Proficiency in Linux (UNIX) and Windows OS.
  • Experienced in setting up data gathering tools such as Flume and Sqoop.
  • Extensive knowledge about Zookeeper process for various types of centralized configurations.
  • Knowledge of monitoring and managing Hadoop cluster using Hortonworks.
  • Experienced in working with Flume to load teh log data from multiple sources directly into HDFS.
  • Worked with application teams to install operating system, Hadoop updates, patches and version upgrades as required.
  • Experienced in analyzing, designing and developing ETL strategies and processes, Writing ETL specifications.
  • Experiences on applications using Java, python and UNIX shell scripting.
  • Have good interpersonal skills, good communication, problem solving skills and a motivated team player.

TECHNICAL SKILLS

Hadoop/Big Data: HDFS, MapReduce, HBase, Pig, Hive, Sqoop, Flume, Linux, HBase, Oozie, Zookeeper, spark, storm& Kafka

Java & J2EE Technologies: Core Java

IDE’s: Eclipse, Net beans

Big data Analytics: Datameer 2.0.5

Frameworks: MVC, Struts, Hibernate, Spring

Programming languages: C, C++, Java, Python, Ant scripts, Linux shell scripts

Databases: Oracle 11g/10g/9i, MySQL, DB2, MS-SQL Server

Web Servers: Web Logic, Web Sphere, Apache Tomcat

Web Technologies: HTML, XML, JavaScript, AJAX, SOAP, WSDL

Network Protocols: TCP/IP, UDP, HTTP, DNS, DHCP, FTP

ETL Tools: Talend, Informatica, Pentaho, SSRS, SSIS, BO, Crystal reports, Cognos.

Testing: Win Runner, Load Runner, QTP

PROFESSIONAL EXPERIENCE

Confidential, Memphis, TN

Big Data/ Scala Developer

Responsibilities:

  • Implemented Hadoop cluster on Cloudera and assisted with performance tuning, monitoring and troubleshooting.
  • Installed and configured MapReduce, HIVE and teh HDFS.
  • Created, altered and deleted topics (Kafka Queues) when required with varyingPerformance tuning using Partitioning, bucketing of IMPALA tables.
  • Involved in file movements between HDFS and AWS S3 and extensively worked with S3 bucket in AWS.
  • Created data partitions on large data sets in S3 and DDL on partitioned data.
  • Converted all Hadoop jobs to run in EMR by configuring teh cluster according to teh data size.
  • Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager and EMR job logs using Genie and kibana.
  • Developing UDFs in java for hive and pig and worked on reading multiple data formats on HDFS using Scala.
  • Involved in file movements between HDFS. Big Data tool to load teh big volume of source files from Hive to HBase.
  • Worked on iterative data validation and processing which is done on Spark with teh halp of Scala.
  • Analyzed teh SQL scripts and designed teh solution to implement using Scala. Developed analytical component using Scala, Spark and Spark Stream.
  • Converted all Hadoop jobs to run by configuring teh cluster according to teh data size.
  • Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Extensive experience in Spark Streaming (version 1.5.2) through core Spark API running Scala, Java to transform raw data from several data sources into forming baseline data.
  • Monitor and Troubleshoot Hadoop jobs using Yarn Resource Manager Job logs using Genie and kibana.
  • Created RDD's in Spark technology and extracted data from data warehouse on to teh Spark RDD’s.
  • Created indexes for various statistical parameters on Elastic Search.
  • Involved in teh development of Pig UDF'S to analyze by pre-processing teh data.Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
  • Developed Scala scripts using both Data frames/SQL/Data sets and RDD/MapReduce in Spark for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
  • Created HBase tables to store various data formats of PII data coming from different portfolios. Data processing using SPARK. Used AVRO, Parquet file formats for serialization of data.
  • Implemented UDFs, UDAFs, UDTFs in java for hive to process teh data dat can't be performed using Hive inbuilt functions.
  • Developed data pipeline using Flume, Sqoop, Pig and Java Map Reduce to ingest claim data and financial histories into HDFS for analysis.
  • Used Hive partitioning and bucketing for performance optimization of teh Hive tables and created around 20000 partitions.Importing and exporting data into HDFS and Hive using Sqoop.
  • Consumed teh data from Kafka queue using Spark.Configured different topologies for Spark cluster and deployed them on regular basis.
  • Ran monthly security checks through UNIX and Linux environment and installed security patches required to maintain high level security to teh clients.
  • Involved in loading data from LINUX file system to HDFS. Importing and exporting data into HDFS and Hive using Sqoop.
  • Managing work flow and scheduling for complex map reduce jobs using Apache Oozie.
  • Developed web application for department of storing and retrieving data of employees using MySQL Database.

Environment: Hadoop, Aws, Map-Reduce, HBase, Elastic Search, EMR, S3, Hive, Impala, Pig, Hive, Sqoop, Hdfs, Flume, Oozie, Spark, Spark SQL, Spark Streaming, Scala, Intellij, Kafka and Cloudera.

Confidential - Des Moines, Iowa

Big Data/ Scala Developer

Responsibilities:

  • Implemented Hadoop cluster on HortonWorks and assisted with performance tuning, monitoring and troubleshooting.
  • Responsible for building scalable distributed data solutions using Hadoop.
  • Worked extensively with Flume for importing social media data.
  • Worked on project to retrieve log messages procured by leveraging Spark Streaming.
  • Designed Oozie jobs for teh auto processing of similar data. Collect teh data using Spark Streaming.
  • Analyzed teh data by performing Hive queries and running Pig scripts to know user behavior.
  • Installed Oozie workflow engine to run multiple Hive and Pig jobs. Used Scala collection framework to store and process teh complex consumer information. Used Scala functional programming concepts to develop business logic.
  • Worked with xml's extracting tag information using xpaths and Scala XMLlibraries from compressed blob datatypes.
  • Create, modify and execute DDL and ETL scripts for De-normalized tables to load data into Hive tables.
  • Developed Spark scripts by using Scala IDE as per teh business requirement.
  • Developed Pig scripts in teh areas where extensive coding needs to be reduced.
  • Worked with Spark Streaming to ingest data into spark engine. Extensively used for all and bulk collect to fetch large volumes of data from table.
  • Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
  • Handled importing of data from various data sources using Sqoop, performed transformations using Hive, MapReduce and loaded data into HDFS.
  • Created HBase tables to store various data formats of PII data coming from different portfolios.
  • Configured Sqoop and developed scripts to extract data from MySQL into HDFS.
  • Worked on running reports in Linux environment. Worked on writing shell scripts to reports in Linux environment. Used Linux to manage files.
  • Hands-on experience with productionalizing Hadoop applications viz. administration, configuration management, monitoring, debugging and performance tuning.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig HBase database and Sqoop.
  • Created HBase tables to store various data formats of PII data coming from different portfolios. Data processing using SPARK.
  • Developed Spark scripts by writing custom RDDs in Scala for data transformations and perform actions on RDDs. Parsed high-level design specification to simple ETL coding and mapping standards.
  • Cluster co-ordination services through Zookeeper. Created Pig Latin scripts to sort, group, join and filter
  • Partitioning data streams using KAFKA. Designed and configured Kafka cluster to accommodate heavy throughput of 1 million messages per second. Used Kafka producer 0.8.3 API's to produce messages.
  • Built big Data solutions using HBase handling millions of records for teh different trends of data and exporting it to Hive.
  • Design and development of database operations in PostgreSQL. Experience working on NoSQL databases like HBase and PostgreSQL.
  • Developed scripts in Hive to perform transformations on teh data and load to target systems for use by teh data analysts for reporting.
  • Developed programs in JAVA, Scala-Spark for data reformation after extraction from HDFS for analysis.
  • Using SBT Scala developed spark code and Spark-SQL/Streaming for faster processing and testing of data.
  • Used Scala collection framework (Lift Framework and Play Framework) to store and process teh complex employer information. Based on teh offers setup for each client, teh requests were post processed and given offers.
  • Worked on running reports in Linux environment. Worked on writing shell scripts to reports in Linux environment. Used Linux to manage files.
  • Designed application which receives data from several source systems and ingest to PostgreSQL database
  • Used Oozie as workflow engine and Falcon for Job scheduling. Debugged teh technical issues and errors was resolved.

Environment: Hadoop, Linux, HDFS, HortonWorks, MapReduce, Pig, Hive, Sqoop, HBase, Oozie, Flume, Zookeeper, java, SQL, Scripting, Scala, SBT, PostgreSQL, Spark, Kafka.

Confidential - Houston, TX

Hadoop Developer

Responsibilities:

  • Continuous monitoring and managing teh Hadoop cluster through Cloudera Manager.
  • Designed application which receives data from several source systems and ingest to PostgreSQL database.
  • Upgraded teh Hadoop Cluster from CDH3 to CDH4, setting up High Availability Cluster and integrating HIVE with existing applications.
  • Worked on analyzing Hadoop cluster and different big data analytic tools including Pig, HBase database and Sqoop.
  • Involved in loading data from Linux file system to HDFS.
  • Created Hive queries dat halped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
  • Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into teh Hadoop Distributed File System and PIG to pre-process teh data.
  • Provided design recommendations and thought leadership to sponsors/stakeholders dat improved review processes and resolved technical problems.
  • Managed and reviewed Hadoop log files. Installing and deploying IBM Web-sphere. Installing and deploying IBM Web-sphere.
  • Implemented teh NoSQL database HBase and teh management of teh other tools and process observed running on YARN.
  • Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig.
  • Experience in deploying applications in heterogeneous Application Servers TOMCAT, WebLogic, IBM WebSphere and Oracle Application Server.
  • Responsible for developing map reduce program using text analytics and pattern matching algorithms
  • Setup and benchmarked Hadoop/HBase clusters for internal use. Assist teh development team to install single node Hadoop 224 in local machine.
  • Participated in architectural and design decisions with respective teams. Developed in-memory data grid solution across conventional and cloud environments using Oracle Coherence.
  • Used Pig as to do transformations, event joins, filters and some pre-aggregations before storing teh data onto HDFS.
  • Analysis with data visualization player Tableau. Writing Pig scripts for data processing.
  • These new data items will be used for further analytics/reporting purpose. It TEMPhas Cognos reports as teh BI component.
  • Designed database and created tables, written teh complex SQL Queries and stored procedures as per teh requirements.
  • Used Hive to analyze teh partitioned and bucketed data and compute various metrics for reporting on teh dashboard. Loaded teh aggregated data onto DB2 for reporting on teh dashboard.

Environment: Big Data/Hadoop, Python, Java, Agile, Spark Streaming, PostgreSQL, HDFS, Map-Reduce, Hive, Pig, Sqoop, Flume, Zookeeper, Oozie, DB2, NoSQL, HBase, Cloudera, Tomcat and Tableau.

Confidential

Java Project

Responsibilities:

  • Involved in various phases of Software Development Life Cycle (SDLC) as design development and unit testing.
  • Developed and deployed UI layer logics of sites using JSP, XML, JavaScript, HTML/DHTML, and Ajax.
  • Agile Scrum Methodology been followed for teh development process.
  • Developed proto-type test screens in HTML and JavaScript.
  • Involved in developing JSP for client data presentation and, data validation on teh client side with in teh forms.
  • Experience in writing PL/SQL stored procedures, Function, Triggers, Oracle reports and Complex SQL’s.
  • Worked with JavaScript to perform client side form validations. Gave an innovative for logging for all interdepends application.
  • Used Struts tag libraries as well as Struts tile framework.
  • Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency. Created connection through JDBC and used JDBC statements to call stored procedures.
  • Client side validation done using JavaScript.
  • Used Data Access Object to make application more flexible to future and legacy databases.
  • Actively involved in tuning SQL queries for better performance.
  • Developed teh application by using teh Spring MVC framework.
  • Collection framework used to transfer objects between teh different layers of teh application.
  • Developed data mapping to create a communication bridge between various application interfaces using XML, and XSL.
  • Proficient in developing applications having exposure to Java, JSP, UML, Oracle (SQL, PL/SQL), HTML, Junit, JavaScript, Servlets, Swing DB2, CSS.
  • Actively involved in code review and bug fixing for improving teh performance.
  • Documented application for its functionality and its enhanced features.
  • Successfully delivered all product deliverables dat resulted with zero defects.

Environment: Spring MVC, Oracle (SQL, PL/SQL), J2EE, Java, struts, JDBC, Servlets, JSP, XML, Design Patterns, CSS, HTML, JavaScript 1.2, Junit, Apache Tomcat, My SQL Server 2008

Hire Now