We provide IT Staff Augmentation Services!

Hadoop/spark Resume

0/5 (Submit Your Rating)

Syracuse, NY

SUMMARY

  • 7+ years of professional experience in IT in Analysis, Design, Development, Testing, Documentation, Deployment, Integration, and Maintenance of web based and Client/Server applications using Java and Big Data Components.
  • 4+ years of relevant experience in design and development of Big Data Analytics using Apache Hadoop ecosystem components Map Reduce, HDFS, HBase, Hive, Impala, Sqoop, Pig, Oozie, Zookeeper, Flume.
  • Worked with HDFS, Hadoop frameworks includes Map Reduce, Pig, Hive, Sqoop, Flume, Zookeeper, Flume, Oozie and HBase
  • Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the data.
  • Hands - on experience on YARN (MapReduce 2.0) architecture and components such as
  • Resource Manager, Node Manager, Container and Application Master and execution of a
  • MapReduce job
  • Developed PIG & Hive scripts.
  • Worked with multiple Databases including RDBMS Technologies (MySql, Oracle) and NoSQL databases(Cassandra, HBase, Neo4J)
  • Capable of provisioning, installing, configuring, monitoring and maintaining HDFS, yarn, Hbase, Sqoop, Pig, Hive.
  • Developed pig and Hive UDF's.
  • Worked with different Hadoop distribution Systems such as Cloudera & Hortonworks
  • Participated in Hadoop cluster environment administration that includes adding and removing cluster nodes, cluster capacity planning, performance tuning, cluster Monitoring, Troubleshooting.
  • Capable of processing large sets of structured, semi-structured and unstructured data and supporting systems application architecture.
  • Supporting Hadoop developers and assisting in optimization of map reduce jobs, Hive Scripts.
  • Experienced in all stages of Software Development Life Cycle including proposal, process engineering, requirement analysis, design, development, testing, deployment and support.
  • Extensive Experience in using MVC (Model View Controller) architecture for developing applications using JSP, JSTL, Java Beans and Servlets.
  • Experience in web application development using open source MVC implementations like Spring Frameworks.
  • Strong implementation knowledge of Java and J2EE Design Patterns.
  • Experience of developing both SOAP (JAXWS) and REST (JAXRS) based web services including both consumer and provider end in enterprise application development.
  • Worked on databases like Oracle & MYSQL.
  • Developed Spark SQL jobs to load data into HDFS rather than sqooping which increases performance.
  • Developed Spark best practices like Partitioning, broadcasting and check pointing.
  • Created calculated columns in Spark data streams.
  • Worked with YARN, MESOS and Spark default schedulers.

TECHNICAL SKILLS

Operating Systems: Ubuntu Linux, Windows, Cent OS

Languages: Java, Scala, SQL, Pig Latin, Hive QL, Unix Shell scripting, Java script, CSS script

Databases: MySQL, Oracle DB, Hbase, Hive, Cassandra, AWS

Web Technology: HTML, CSS, JavaScript

Tools: & Utilities: Edit Plus, Notepad++, Eclipse 3.5, NetBeans, SQL Plus, github, winScP

Agile tools: Jira

Hadoop Technologies: Hadoop, Hive, HBASE, PIG, Sqoop, Oozie, Flume, Cassandra, spark, Scala

Technologies Interested to Learn: Elastic Search, Splunk and Kibana.

PROFESSIONAL EXPERIENCE

Confidential, Syracuse, NY

Hadoop/Spark

Responsibilities:

  • Responsible to manage data coming from different sources.
  • Involved in loading data from LINUX file system to HDFS.
  • Maintaining and monitoring clusters. Loaded data into cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
  • Involved in writing Flume and Hive scripts to extract, transform and load the data into Database.
  • Developed Spark application for processing HDFS data and do in memory operations.
  • To store SCD (slowly changing dimensions) Type 2 data used Hbase database.
  • Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
  • Loading data into DB and access data via different ecosystems.
  • Importing and exporting data from RDBS to HDFS using Sqoop
  • Developed Hive and Pig custom UDF's to maintain unique date format across the HDFS.
  • Implemented partitioning and bucketing concepts in hive to optimize the storage.
  • Used Avro, Parquet and ORC data formats to store in to HDFS.
  • Used Snappy and bzip2 compression techniques to minimize the disk usage.
  • Load the data into Spark RDD and performed in-memory data computation to get faster output response.
  • Used AWS services like S3, EC2 for smaller datasets.
  • Worked on POC for Apache Kafka and Spark Streaming.
  • Worked with spark eco system using Spark SQL queries on data formats like Text file, CSV file and XML files.
  • Worked with Kafka message queue for Spark streaming.
  • Used Kerberos to enable security for databases also created secure passwords using jceks for flume.

Environment: HDP 2.3.4, Hadoop, Hive, HDFS, HPC, WEBHDFS, WEBHCAT, Spark, Spark-SQL, KAFKA,Java, Scala, Web Server's, Maven Build and SBT build.

Confidential, Menomonee Falls, WI

Hadoop/Spark

Responsibilities:

  • Worked with the cluster configuration of 51 nodes, 12.21 TB overall memory, 1600 cores.
  • Worked extensively with Hortonworks Distributed Hadoop platform with kerberos enabled.
  • Ingested data from different data sources (IBM Cloudant, galaxy, SQL Server) into data lake using NIFI (Niagara Files).
  • Developed Spark application for both batch process and streaming process.
  • Created Hive external tables with ORC, Bucketing & Transactional properties (ACID properties).
  • Created a repository in GitHub (version control system) to store project and keep track of changes to files.
  • Used Eclipse Neon to develop spark application in PYDEV perspective.
  • Used IDLE (python GUI) for developing python code and in corporate in spark application.
  • Performed SQL Joins among Hive tables to get input for spark batch process.
  • Active member for developing streaming application using Apache Kafka.
  • Developed a spark streaming application using KAFKA with a batch frequency of 10sec.
  • After pulling data from Kafka topic into data frame, filtered for bad records (non JSON, empty records) before calling Rest API to reduce redundant API calls.
  • Ingested all formats of structured and semi-structured data including relational databases, JSON using NIFI & Kafka into HDFS.
  • Experience in pulling the data from Amazon s3 bucket to data lake and built hive tables on top of it and also created data frames in spark on top of that data and performed further analysis.
  • Used TEZ execution engine to run jobs for faster execution.
  • Converted the existing hive queries into spark. Thus, improved the overall performance of job from (80% of cluster resources to 24% of Cluster resources) and could run the job in (20 min from 80 min).
  • Used Coalesce and repartition on data frames while optimizing the spark jobs.
  • Used DB Visualizer - database tool to query hive tables for better visualization.
  • Responsible for handling different data formats like AVRO, ORC, CSV.
  • Automated workflows using shell scripts which pulls developed code from GitHub into Hadoop.
  • Used Putty -SSH Client to connect remotely to the servers.
  • Used "Rally" - GUI for keeping track of all the user stories and tasks.
  • Involved in daily SCRUM meetings to discuss the development/progress and was active in making scrum meetings more productive.

ENVIRONMENT: MAP REDUCE, YARN, HIVE, PIG, HBASE, OOZIE, SQOOP, SPLUNK, FLUME, ORACLE 11G, CORE JAVA, CLOUDERA, ECLIPSE, PYTHON, SCALA, SPARK, SQL, TERADATA, UNIX SHELL SCRIPTING.

Confidential, Dallas, TX

Hadoop

Responsibilities:

  • Used Bash Shell Scripting, Sqoop, AVRO, Hive, Pig, Java, Map/Reduce daily to develop ETL, batch processing, and data storage functionality.
  • Used Pig to do data transformations, event joinsand some pre-aggregations before storing the data on the HDFS.
  • Exploited Hadoop MySQL-Connector to store Map Reduce results in RDBMS.
  • Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
  • Worked on loading all tables from the reference source database schema through Sqoop.
  • Worked on designed, coded and configured server side J2EE components like JSP, AWSand JAVA.
  • Collected data from different databases (i.e. Oracle, MySQL) to Hadoop
  • Used Oozie and Zookeeper for workflow scheduling and monitoring.
  • Worked on Designing and Developing ETL Workflows using Java for processing data in HDFS/Hbase using Oozie.
  • Experienced in managing and reviewing Hadoop log files.
  • Involved in loading and transforming large sets of structured, semi structured and unstructureddata from relational databases into HDFS using Sqoop imports.
  • Working on extracting files from MySQL through Sqoop and placed in HDFS and processed.
  • Supported Map Reduce Programs those running on the cluster.
  • Cluster coordination services through Zoo Keeper.
  • Involved in loading data from UNIX file system to HDFS.
  • Created several Hive tables, loaded with data and wrote Hive Queries in order to run internally in MapReduce.
  • Developed Simple to complex MapReduce Jobs using Hive and Pig.

Environment: Apache Hadoop, AWS, Map Reduce, HDFS, Hive, Java, SQL, PIG, Zookeeper, Java (jdk1.6), Flat files, Oracle 11g/10g, MySQL, Windows NT, UNIX, Sqoop, Hive, Oozie, HBase.

Confidential, Texas,TX

Java/J2EE Developer

Responsibilities:

  • Designed Use Case and Sequence Diagrams according to UML standard using Rational Rose.
  • Implemented Model View Controller (MVC-2) architecture and developed Form classes, Action Classes for the entire application using Struts Framework.
  • Performed client side validations using JavaScript and server side validations using in built Struts Validation Framework.
  • Implemented the data persistence functionality of the application by using Hibernate to persist java objects to the relational database.
  • Used Hibernate Annotations to reduce time at the configuration level and accessed Annotated bean from Hibernate DAO layer.
  • Worked on various SOAP and RESTful web services used in various internal applications. • Used SOAP UI tool for testing the RESTful web services.
  • Used HQL statements and procedures to fetch the data from the database. • Transformed, Navigated and Formatted XML documents using XSL, XSLT.
  • Used LAMBDA EXPRESSION OF JAVA 1.8 features extensively to remove the boiler plate code and to extend the functionality.
  • Used a LAMBDA EXPRESSION to improve SackEmployees further and avoid the need for a separate class.
  • Used JMS for asynchronous exchange of message by applications on different platforms.
  • Developed the view components using JSP, HTML, Struts Logic tags and Struts tag libraries.
  • Involved in designing and implementation of Session Facade, Business Delegate, Service Locator patterns to delegate request to appropriate resources.
  • Used JUnit Testing Framework for performing Unit testing.
  • Deployed application in WebSphere Application Server and developed using Rational Application RAD.

Environment: Struts 2.0 Hibernate 3.0, JSP, JDK 1.7, RAD, JMS, CVS, JavaScript, XSL, XSLT, lambda expression, Servlets 2.5, WebSphere Application Server, Oracle 10g.

Confidential

SQL/PLSQL

Responsibilities:

  • Developed Advance PL/SQL packages, procedures, triggers, functions, Indexes and Collections to implement business logic using SQL Navigator. Generated server side PL/SQL scripts for data manipulation and validation and materialized views for remote instances
  • Participated in change and code reviews to understand the testing needs of the change components. Worked on troubleshooting defects in timely manner.
  • Involved in Defragmentation of tables, partitioning, compressing and indexes for improved performance and efficiency. Involved in table redesigning with implementation of Partitions Table and Partition Indexes to make Database Faster and easier to maintain.
  • Experience in Database Application Development, Query Optimization, Performance Tuning and DBA solutions and implementation experience in complete System Development Life Cycle.
  • Used SQL Server SQL*loader tool to build high performance data integration solutions including extraction, transformation and load packages for data warehousing. Extracted data from the XML file and loaded it into the database.
  • Designed and developed Oracle forms & reports generating up to 60 reports
  • Used principles of Normalization to improve the performance. Involved in ETL code using PL/SQL in order to meet requirements for Extract, transformation, cleansing and loading of data from source to target data structures.

Environment: SQL Server 2005, T-SQL,PLSQL, DTS Designer, MS-Office, MS-Excel, VSS

We'd love your feedback!