Hadoop Big Data/spark Developer Resume
Berkley Heights, NJ
SUMMARY:
- 7 years of experience in IT and 6 years of experience Hadoop/Big Data eco systems and Java technologies like HDFS, Map Reduce, Apache Pig, Hive, Hbase, Spark Kafka and Sqoop.
- In depth knowledge of Hadoop Architecture and Hadoop daemons such as Name Node, Secondary Name Node, Data Node, Job Tracker and Task Tracker.
- Experience in writing Map Reduce programs using Apache Hadoop for analyzing Big Data.
- Experience in Cloudera, Hadoop, Horton works and Map - R distribution components and their custom packages.
- Hands on experience in writing Ad-hoc Queries for moving data from HDFS to HIVE and analyzing the data using HIVE QL.
- Experience on AWS infrastructure services Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2)
- Experience in importing and exporting data using SQOOP from Relational Database Systems to HDFS.
- Experience in writing Hadoop Jobs for analyzing data using Pig Latin Commands.
- Good Knowledge of analyzing data in HBase using Hive and Pig.
- Working Knowledge in NoSQL Databases like HBase and Cassandra.
- Hands on Experience in setting up automated monitoring and escalation infrastructure for Hadoop Cluster using Ganglia and Nagios.
- Experience in developing and designing POCs using Scala and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Experience in Integrating BI tools like Tableau and pulling required data to in-memory of BI tool.
- Experience in Launching EC2 instances in Amazon EMR using Console.
- Extending Hive and PIG core functionality by writing custom UDFs like UDAFs and UDTFs.
- Experience in administrative tasks such as installing Hadoop and its ecosystem components such as Hive and Pig in Distributed Mode.
- Experience in using Apache Flume for collecting, aggregating and moving large amounts of data from application servers.
- Passionate towards working in Big Data and Analytics environment.
- Knowledge on Reporting tools like Tableau which is used to do analytics on data in cloud.
- Extensive experience with SQL, PL/SQL, Shell Scripting and database concepts.
- Experience with front end technologies like HTML, CSS and JavaScript.
- Experience in working with Windows, UNIX/LINUX platform with different technologies such as Big Data,SQL, XML, HTML, Core Java, Shell Scripting etc.
TECHNICAL SKILLS:
Database: DB2, MySQL, Oracle, MS SQL Server
Languages: Core Java, PIG Latin, SQL, Hive QL, Shell Scripting and XML
API s/Tools: NetBeans, Eclipse, MYSQL workbench, Visual Studio
Web Technologies: HTML, XML, JavaScript, CSS
BigData Ecosystem: HDFS, PIG, MAPREDUCE, HIVE, KAFKA,SQOOP, FLUME, HBase
Operating System: Unix, Linux, Windows XP
Visualization Tools: Tableau, Zeppelin
Virtualization Software: VMware, Oracle Virtual Box.
Cloud Computing Services: AWS (Amazon Web Services).
PROFESSIONAL EXPERIENCE:
Confidential, Berkley Heights, NJ
Hadoop Big Data/Spark Developer
Responsibilities:
- Analyzing the requirement to setup a cluster.
- Developed MapReduce programs in Java for parsing the raw data and populating staging Tables.
- Created Hive queries to compare the raw data with EDW reference tables and performing aggregates
- Importing and exporting data into HDFS and Hive using SQOOP.
- Writing PIG scripts to process the data.
- Developed and designed Hadoop, Spark and Java components.
- Developed Spark programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Developed Spark code to using Scala and Spark-SQL for faster processing and testing.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Involved in HBASE setup and storing data into HBASE, which will be used for further analysis.
- Explored the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, spark YARN and converted Hive queries into Spark transformations using Spark RDDs.
- Used Oracle Big Data Appliance (BDA) platform for running diverse workloads on Hadoop and NoSQL systems
- Created applications using Kafka, which monitors consumer lag within Apache Kafka clusters. Used in production by multiple companies.
- Developed Unix/Linux Shell Scripts and PL/SQL procedures.
- Developed oracle databases appliance and deployed after setup was completed.
- Worked towards creating real time data streaming solutions using Apache Spark/Spark Streaming, Kafka.
- Performance optimizations on Spark/Scala. Diagnose and resolve performance issues.
- Installed and configured Hive and written Hive UDFs.
- Involved in converting Map Reduce programs into Spark transformations using Spark RDD's using Scala and Python.
- Involved in creating Hive tables, loading with data and writing hive queries using the HIVEQL which will run internally in MAPREDUCE way.
- Installed application on AWS EC2 instances and configured the storage on S3 buckets.
- Loaded some of the data into Cassandra for fast retrieval of data.
- Involved in developing Spark code using Scala and Spark-SQL for faster testing and processing of data and exploring of optimizing it using Spark Context, Spark-SQL, Pair RDD's, Spark YARN.
- Exported the analyzed data to the relational databases using SQOOP for visualization and to generate reports by our BI team.
- Imported data from AWS S3 into Spark RDD, Performed transformations and actions on RDD's.
- Extracted files from Cassandra through Sqoop and placed in HDFS and processed.
- Implementation of Big Data solutions on the Hortonworks distribution and AWS Cloud platform.
- Developed Pig Latin scripts for handling data formation.
- Extracted the data from MySQL into HDFS using SQOOP.
- Experience in managing and monitoring Hadoop cluster using Cloudera Manager.
Environment: Hadoop, Cloudera distribution, Hortonworks distribution, AWS, EMR, Azure cloud platform, HDFS, MapReduce, Oracle Bigadata Appliance, DocumentDB Unix Shell Scripting, Kafka, Pig, Hive, Sqoop, Flume, Oozie, Zoo keeper, Core Java, impala, HiveQL, Spark, UNIX/Linux Shell Scripting.
Confidential, Bloomfield, CT
Big Data Developer
Reponsibilities:
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from LINUX file system to HDFS.
- Working experience in HDFS Admin Shell commands.
- Worked in loading data from Teradata, AWS into HDFS using Sqoop.
- Experience in ETL methods for data extraction, transformation and loading in corporate-wide ETL Solutions and Data warehouse tools for reporting and data analysis.
- Understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node and Data Node concepts.
- Developed Kafka producer and consumers, HBase clients, Apache Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Built code for real time data ingestion using Java, Map R-Streams (Kafka) and STORM.
- Designed, configured and managed public/private cloud infrastructures utilizing AWS.
- Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD, Spark Streaming, Pair RDD Operations, Check-pointing, and SBT.
- Setting up data in AWS using S3 bucket and configuring instance backups to S3 bucket.
- Used Kafka to transfer data from different data systems to HDFS.
- Migrated complex map reduce programs into Spark RDD transformations, actions.
- Involved in the development of Spark Streaming application for one of the data source using Scala, Spark by applying the transformations.
- Installing and Configuring Systems for use with Cloudera distribution of Hadoop (consideration given to other variants of Hadoop such as Apache, MapR, Hortonworks, Pivotal, etc.)
- Installed, configured and Administrated of all UNIX/LINUX servers, includes the design and selection of relevant hardware to Support the installation/Upgradation of Red Hat and CentOS operating systems.
- Developed a script in Scala to read all the Parquet Tables in a Database and parse them as Json files, another script to parse them as structured tables in Hive.
- Developed data pipeline expending Pig and Java Map Reduce to consume customer behavioral data and financial antiquities into HDFS for analysis.
- Implemented Spark RDD transformations to map business analysis and apply actions on top of transformations.
- Used spark to parse XML files and extract values from tags and load it into multiple hive tables.
- Experience on different Hadoop distribution Systems such as: Cloudera & Hortonworks
- Hands on experience on Cassandra DB.
- Analyzed large data sets by running Hive queries and Pig scripts.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Hands on using SQOOP to import and export data into HDFS from RDBMS and vice-versa.
- Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Performing Linux systems administration on production and development servers (Red Hat Linux, CentOS and other UNIX utilities)
- Installation, Upgradation and administration of Sun Solaris, Red hat Linux.
- Used SQOOP, AVRO, HIVE, PIG, Java, MAPREDUCE daily to develop ETL, Batch Processing and data storage functionality.
- Supported implementation and execution of MAPREDUCE programs in a cluster environment.
Environment: Hadoop, MapReduce, Hive,Pig, Hbase, Sqoop, Kafka, Cassandra, Flume, Java, SQL, AWS, Cloudera Manager, Eclipse, Unix Script, YARN.
Confidential, Columbus, OH
Hadoop Engineer
Responsibilities:
- Written Map Reduce code to parse the data from various sources and storing parsed data into Hbase and Hive.
- Integrated Map Reduce with HBase to import bulk amount of data into HBase using Map Reduce Programs.
- Imported data from different relational data sources like Oracle, Teradata to HDFS using Sqoop.
- Worked on a stand-alone as well as a distributed Hadoop application.
- Worked in AWS environment for development and deployment of Custom HadoopApplications.
- Used Scala to convert Hive/SQL queries into RDD transformations in Apache Spark.
- Maintained and administrated HDFS through Created Hive tables to store the processed results in a tabular format. Created 25+ Linux Bash scripts for users, groups, data distribution, capacity planning, and system monitoring.
- Used Oozie and Zookeeper to automate the flow of jobs and coordination in the cluster respectively.
- Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
- Administering and Maintaining Cloudera Hadoop Clusters Provision physical Linux systems, patch, and maintain them.
- Primarily using Cloudera Manager but some command-line. Designed a scalable Big Data clusters.
- Extensive knowledge on PIG scripts using bags and tuples and Pig UDF'S to pre-process the data for analysis.
- Provided security and authentication with ranger where ranger admin provides administration and user sync adds the new users to the cluster.
- Implemented usage of Amazon EMR for processing Big Data across a Hadoop Cluster of virtual servers.
- Used Teradata to build Hadoop project and also as ETL project.
- Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
- Involved in writing query using Impala for better and faster processing of data.
- Involved in developing Impala scripts for extraction, transformation, loading of data into data warehouse.
- Experienced in migrating HiveQL into Impala to minimize query response time.
- Experience in installing, configuring Cloudera, MapR, Hortonworks clusters and installing Hadoop ecosystem components like Hadoop Pig, Hive, HBase, Sqoop, Kafka, Oozie, Flume and Zookeeper.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Involved in collecting and aggregating large amounts of log data using Apache and staging data in HDFS for further analysis.
- Develop testing scripts in Python and prepare test procedures, analyze test results data and suggest improvements of the system and software.
Environment: HDFS, MapReduce, Python, CDH5, Hbase, NOSQL, AWS, Hive, Pig, Hadoop, Sqoop, Impala, Yarn, Shell Scripting, Ubuntu, Linux Red Hat.
Confidential
Java Developer
Reponsibilities:
- Involved in design and development phases of Software Development Life Cycle (SDLC) using Scrum methodology.
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
- Involved in Installing, Configuring Hadoop Eco System, and Cloudera Manager using CDH4 Distribution.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Involved in creating Hive Tables, loading with data and writing Hive queries which will invoke and run Map Reduce jobs in the backend.
- Worked on analyzing Hadoop cluster and different big data analytic tools including Map Reduce, Hive and Spark.
- Developed the Map Reduce programs to parse the raw data and store the pre-Aggregated data in the portioned tables.
- Involved in start to end process of Hadoop cluster installation, configuration and monitoring
- Responsible for building scalable distributed data solutions using Hadoop and Involved in submitting and tracking Map Reduce jobs using Job Tracker.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Worked with HBase in creating tables to load large sets of semi structured data coming from various sources.
- Created design documents and reviewed with team in addition to assisting the business analyst / project manager in explanations to line of business.
- Responsible for understanding the scope of the project and requirement gathering.
- Involved in analysis, design, construction and testing of the application
- Developed the web tier using JSP to show account details and summary.
- Designed and developed the UI using JSP, HTML, CSS and JavaScript.
- Used Tomcat web server for development purpose.
- Involved in creation of Test Cases for JUnit Testing.
- Used Oracle as Database and used Toad for queries execution and also involved in writing SQL scripts, PL/SQL code for procedures and functions.
- Developed application using Eclipse and used build and deploy tool as Maven.
Environment: Hadoop, HBase, HDFS, Pig Latin, Sqoop, Hive,Java, J2EE Servlet, JSP, JUnit, AJAX, XML, JavaScript, Maven, Eclipse, Apache Tomcat, and Oracle.
Confidential
Java Developer
Responsibilities:
- Hands on experience in J2EE framework and extensively used Core Java and Spring API in developing the business logic using Agile Methodology.
- Implemented Model View Controller (MVC) Architecture based presentation using JSF framework.
- Worked on Servlets, JSP, Drools, JDBC and JavaScript under MVC Framework and implemented OOAD concept in the applications.
- Extensive experience of developing Representational state transfer (REST) based services and Simple
- Object Access Protocol (SOAP) based services.
- Used Oracle as database and involved in the development of PL/SQL backend implementation and using
- SQL created Select, Update and Delete statements.
- Involved in designing of user interface.
- Extensively used the J2EE design patterns like Session Façade, Business Object (BO), Service Locator, Data Transfer Object (DTO) and Data Access Object (DAO), Singleton, Factory.
- Involved in writing EJBs (Stateless Session Beans) and Web Services for building the middleware distributed components and deployed them on application servers.
- Developed RESTFUL web service and hands on experience in JSON parsing and XML parsing.
- Implemented the Spring Data and Hibernate framework (ORM) to interact with database.
- Designed and developed web pages using HTML, JSP, JavaScript and XSLT, involved in writing new JSPs, designed pages using HTML and client validation using Angular JS, jQuery and JavaScript.
- Performed Unit testing and Integration Testing.
- Involved in Agile methodology with respect to the successful development of the project.
- Deployed GUI code to Web Logic application environment and standalone components to JBoss Server.
- Developed web services to perform various operations on the supplier information.
- Supported the applications through debugging, fixing and maintenance releases.
- Involved in mapping the data from various vendors with the existing database.
- Responsible for updating the supplier database if new updates are available.
- Responsible for requirements gathering, analyzing and developing design documents and reviewing with business.
- Involved in Units integration, bug fixing and User acceptance testing with test cases.
Environment: Java 1.8, J2EE, Servlets, JSF’s, JQuery, Spring 3 (Spring MVC, Spring Annotations, Spring AOP), Microsoft SQL, Log4J, JDBC, Spring JDBC, JUnit, XML, Hibernate, Swing, Unix, Windows, JavaScript, AJAX, REST, PL/SQL, CSS, Maven, LINUX.