Big Data Developer Resume
Los Angeles, CA
PROFESSIONAL SUMMARY:
- Around 7 years of experience in the field of IT including more than 4 years of experience in Hadoop/Spark and 3 years of experience in Java and SQL developer with good object oriented programming skills.
- Excellent understanding /knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and Map Reduce programming paradigm.
- Experienced in using Scala, Pyspark, Java tools like Intelli J, Eclipse.
- Expertise with different tools in Hadoop Environment including Pig, Hive, HDFS, Map Reduce, Sqoop, Spark, Kafka, Flume, Oozie, and Zookeeper, Akka.
- Extensive work in ETL process consisting of data transformation, data sourcing, mapping, conversion.
- Experience in using Cloud era Manager for installation and management of single - node and multi-node Hadoop cluster.
- Worked on Multi Clustered environment and setting up Horton works Hadoop System.
- Having good experience in Big data using Sqoop to import data into HDFS from RDBMS and vice-versa.
- Hands on experience in configuring and working with Flume to load the data from multiple sources directly into HDFS.
- Participated in converting existing Hadoop jobs to spark jobs using Spark Core, Spark SQL, Spark Streaming, MLib and GraphX.
- Expertise in Map Reduce programs in HIVE QL and PIG to validate and cleanse the data in HDFS, obtained from heterogeneous data sources, to make it suitable for analysis.
- Analyzed, transformed stored data by writing MR jobs based on business requirements.
- Implemented Apache Solr for fast retrieval for information.
- Good knowledge about YARN configuration, Storm real time processing architecture.
- Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
- Hands on experienced in working with Elastic Map reduce and setting up environments on Amazon AWS EC2 instances.
- Hands on experience working with No SQL database including Mongo DB and HBase.
- Experience in optimizing Map Reduce jobs to use HDFS efficiently by using various compression mechanisms.
- Experience in developing Pig Latin scripts to extract the data from the web server output files to load into HDFS.
- Have experience on highly available Hadoop clusters secured with Kerberos, preferably using the Cloudera Hadoop distribution.
- Hadoop jobs integrated with Kerberos security for Hive, Impala and Hbase.
- Assisted in loading large sets of data (Structure, Semi Structured, and Unstructured).
- Having Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Evaluation of ETL and OLAP tools and recommend the most suitable solutions based on business needs.
- Configured services with web interface Hue.
- Well experienced in using networking tools like Putty and WinScp.
- Created Business Intelligence reports using Tableau.
- Hands on experience in SERDES to converting files such as Text, Json, XML, Log into Hive QL(table)
- Experience with Sequence files AVRO, ORC, Parquet, Sequence file formats and compression.
- Hands on experience on writing python scripts to automate the entire job flow of execution and integrating in one script.
- Knowledge of project implementation methodologies including Waterfall and Agile.
- Ability to adapt to evolving technology, a strong sense of responsibility and accomplishment.
- Willing to relocate: Anywhere.
TECHNICAL SKILLS:
Operating Systems: Windows 8/7/XP, Ubuntu 13.X.
Hadoop Eco System: Hadoop 1.x/2.x(Yarn), HDFS, Map Reduce, Mongo, HBase, Hive, PIG, Zookeeper, Sqoop, Oozie, Flume, Storm, HDP, AWS, Cloud era-desktop and SVN.
Programming Languages: Scala, Java, Pyspark, SQL,C.
Tools: Java, Map Reduce, Struts, Hibernate, AJAX, HTML, JavaScript, CSS, JSP, JSON, Web Services, DHTML, JavaScript, DOM, JQuery, XML.
API’s: Servlets, Java and Directory Interface Map Reduce.
Development Tools: Eclipse, Command Editor, SQL Developer, Microsoft Suite (Word, Excel, PowerPoint, Access), VM Ware.
No SQL Databases: HBase, Cassandra.
PROESSIONAL EXPERIENCE:
Confidential, Los Angeles, CA
Big Data Developer
Roles & Responsibilities:
- Designing the entire architecture of the data pipeline for analysis.
- Worked on Scala 2.11.2 jobs using Spark 2.1.1 for data processing using RDD's and Data frame API.
- Performance tuning of Spark and Sqoop Job.
- Worked on transforming the queries written in Hive to Spark Application.
- Building the Spark Application and deploying on cluster.
- Worked on Apache Nifi to Uncompressed and move Json files from local to HDFS.
- Created Oozie Jobs for workflow of Spark, Sqoop and Shell scripts.
- Complete end to end design and development of Apache Nifi flow which acts as the agent between middleware team and EBI team and executes all the actions.
- Good knowledge in using apache Nifi to automate the data movement between different Hadoop systems.
- Started using apache Nifi to copy the data from local file system to HDP.
- Created Spark Application to load data into Dynamic Partition Enabled Hive Table.
- Worked on state full transformation of Spark Application.
- Worked on Hive Scripts to apply various transformations and saving the data into Parquet file format.
- Scala Applications to load processed into Data Stax Cassandra 4.8.
- Map-Reduce Job to compare two files CSV and save the processed output into Hbase.
- Hands on design and development of an application using Hive (UDF).
- Responsible for writing Hive Q for analyzing data in Hive warehouse using Hive Query.
- Provide support data analysts in running Pig and Hive queries.
- Created dynamic partitioned tables in Hive.
- Worked on Sqoop jobs to import data from Oracle EDB and bring into HDFS.
- Responsible for defining the data flow within Hadoop eco system and direct the team in implement them.
- Worked on POC's to integrate Spark with other tools.
- Worked on Data Modeling for Dimension and Fact tables in Hive Warehouse.
- Performed Kafka analysis, feature selection, feature extraction using Apache Spark Machine Learning streaming libraries inPython.
- Involved in administration, installing, upgrading and managing CDH3, Pig, Hive & Hbase
- Involved in Cluster coordination services through Zookeeper.
- Designed and Maintained Oozie workflows to manage the flow of jobs in the cluster.
- Parse Json files through Spark core to extract schema for the production data using Spark SQL and Scala.
Environment: Hadoop 2.6, HDFS, Pig, Nifi, Hive, MapReduce, Sqoop, Kafka, CDH3, Cassandra, Python, Oozie, Scala 2.11.2,Spark 2.1.1,SQL, No SQL, HBase, Flume, spark,, Zookeeper, ETL, Centos, Eclipse, Agile, Apache Phoenix.
ConfidentiaL, Charlotte, NC
Big Data Developer
Rues & Responsibilities:
- Involved in Installing, Configuring Hadoop community, and Horton works platforms utilizing HDP2.3 Distribution.
- Handled importing of data from various data sources, performed data control checks using Spark and loaded data into HDFS.
- Worked on the Spark SQL for analysing the data
- Used Scala to write code for all Spark use cases.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, and Pair RDD'S, and YARN.
- Involved in converting Hive Q into Spark transformations using Spark RDD'S and Scala.
- Worked in Spark SQL on different data formats like JSON and Parquet.
- Developed Spark udf by using Scala shell commands as per the requirement.
- Used Spark API over Cloud era Hadoop YARN to perform analytics on data in HDFS.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Used Kafka for stream processing, tracking website activities, monitoring and aggregation of logs.
- Developed MapReduce (YARN) programs to cleanse the data in HDFS obtained from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Integrate Kafka and Storm by using Avro for serializing and desterilizing the data and Kafka procedure and consumer.
- Performed joins, group by and other operations in MapReduce by using Spark Udf along with java.
- Designed and developed Map Reduce jobs to process data coming in different file formats like XML, CSV, and JSON.
- Developed workflows using Oozie to automate the tasks of loading the data into HDFS and pre-processing with Pig.
- Designed and developed multiple database persistence layer using Hibernate, Spark SQL and Kafka along with Used Akka
- Worked on messaging frameworks likeKafka, tuning optimization, meeting non-functional requirements and SLA mentioned latencies.
- Worked on Configuring Zookeeper,Kafkacluster.
- Worked on CreatingKafkatopics, partitions, writing custom partitioned classes.
- Used Sqoop to import the data from RDBMS to Hadoop Distributed File System (HDFS) and later analysed the imported data using Hadoop Components.
- Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analysed them by running Hive queries.
- Used Oozie for automating the end to end data pipelines and Oozie coordinators for scheduling the work flows.
- Implemented daily workflow for extraction, processing and analysis of data with Oozie.
Environment: Spark - 1.5.2, Spark SQL, Hive, HDFS, HQL, YARN, HBase, MapReduce, Sqoop, Flume, Oozie, Kafka, Scala, Mongo DB, Yarn, Hive, Pig, Storm, Core Java, Eclipse,Talend.
Confidential, Boston, MA
Jr. Big Data Developer
Roles & Responsibilities:
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Involved in installing and updating and managing Environment.
- Collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Involved in writing Pig Scripts for Cleansing the data and implemented Hive tables for the processed data in tabular format.
- Involved in running Hadoop streaming jobs to process terabytes of XML format data. Participated in requirement gathering and analysis phase of the project in documenting the business requirements by conducting workshops/meetings with various business users.
- Involved in Sqoop, HDFS Put or Copy from Local to ingest data.
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Created and maintained technical documentation for launching Hadoop clusters and for executing Hive queries and Pig Scripts.
- Implemented test scripts to support test-driven development and continuous integration.
- Implemented SQL, PL/SQL Stored Procedures.
- Involved in developing Shell scripts to orchestrate the execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
- Actively updated the upper management with daily updates on the progress of a project that include the classification levels that were achieved on the data.
Environment: Hadoop, MapReduce, NoSql, Hive, Pig, Sqoop, Core Java, HDFS, Eclipse.
Confidential
Java Developer
Roles & Responsibilities:
- Developed the presentation layer using JSP, HTML, CSS and client validations using JavaScript.
- Developed user interface using JSP, Struts Tag Libraries to simplify the complexities of the application.
- Developed business logic using Stateless session beans for calculating asset depreciation on Straight line and written down value approaches.
- Developed all the User Interfaces usingJSPandSpring MVCframework.
- Writing Client Side validations using JavaScript.
- Extensively usedJ Queryfor developing interactive web pages.
- Developed the user interface presentation screens using HTML, XML and CSS.
- Developed the Shell scripts to trigger the JavaBatchjob, Sending summary email for the batch job status and processing summary.
- Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing responsible for defects allocation and those defects are resolved.
- Involved in performing maintenance on the existing code base developed in thespringandHibernateframework by incorporating new features and doing bug fixes.
- Used ANT tool to build and deploy applications.
- Involved in configuring web.xml for workflow.
- Wrote SQL queries and created DDL scripts for interacting with the Oracle database.
- Involved in writing Stored Procedures in Oracle to do some database side validations.
Environment: Java, Servlets, JSP, EJB, J2EE, XML, XSLT, JavaScript, CSS, HTML, SQL, PL/SQL, MS Visio, Eclipse, JDBC, Windows XP, Web Logic portal.
Confidential
Jr. Java Developer
Roles & Responsibilities:
- Analyzed System requirements and designed Use Case Diagrams from requirement specifications.
- Database design using data modeling techniques and Server side coding using Java.
- Developed JSPs for displaying shopping cart contents and to add, modify, save and delete cart items
- Implemented Online shopping module using EJBs with the business logic implemented as per persistence requirements of data model using Session and Entity Beans according to EJB specifications
- Developed UI using HTML, JavaScript, and JSP, and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed various EJBs for handling business logic and data manipulations from database.
Environment: HTML, JAVA, JSP, XML, JDBC.