Big Data/hadoop Developer Resume
Fort Laudedale, FL
PROFESSIONAL SUMMARY:
- Above 8+ working experience as a Big Data/Hadoop Developer in designed and developed various applications like big data, Hadoop, Java/J2EE open - source technologies.
- Strong development skills in Hadoop, HDFS, Map Reduce, Hive, Sqoop, HBase with solid understanding of Hadoop internals.
- Experience in Programming and Development of java modules for an existing web portal based in Java using technologies like JSP, Servlets, JavaScript and HTML, SOA with MVC architecture.
- Expertise in ingesting real time/near real time data using Flume, Kafka, Storm
- Good knowledge of NO SQL databases like Mongo DB, Cassandra and HBase.
- Excellent knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MRA and MRv2 (YARN).
- Expertise in writing Hadoop Jobs to analyze data using MapReduce, Apache Crunch, Hive, Pig and SOLR, Splunk.
- Hands on experience in installing, configuring and using Apache Hadoop ecosystem components like Hadoop Distributed File System (HDFS), MapReduce, Pig, Hive, HBase, Apache Crunch, Zookeeper, Scoop, Hue, Scala, AVRO.
- Strong Programming Skills in designing and implementing of multi-tier applications using Java, J2EE, JDBC, JSP, JSTL, HTML, CSS, JSF, Struts, JavaScript, Servlets, POJO, EJB, XSLT, JAXB.
- Extensive experience in SOA-based solutions - Web Services, Web API, WCF, SOAP including Restful APIs services
- Good Knowledge in Amazon Web Service (AWS) concepts like EMR and EC2 web services which provides fast and efficient processing of Teradata Big Data Analytics.
- Experienced in collection of Log Data and JSON data into HDFS using Flume and processed the data using Hive/Pig.
- Expertise in developing a simple web based application using J2EE technologies like JSP, Servlet, and JDBC.
- Strong Experience in Front End Technologies like JSP, HTML5, JQuery, JavaScript, CSS3.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- Excellent technical and analytical skills with clear understanding of design goals of ER modeling for OLTP and dimension modeling for OLAP.
- Experience in development of Big Data projects using Hadoop, Hive, HDP, Pig, Flume, Storm and Map Reduce open source tools/technologies.
- Strong expertise on Amazon AWS EC2, Dynamo DB, S3, Kinesis and other services
- Expertise in Big Data architecture like hadoop (Azure, Hortonworks, Cloudera) distributed system, MongoDB, NoSQL
- Experience working on EC2 (Elastic Compute Cloud) cluster instances, setup data buckets on S3 (Simple Storage Service), set EMR (Elastic MapReduce).
- Work Extensively in Core Java, Struts2, JSF2.2, Spring3.1, Hibernate, Servlets, JSP and Hands-on experience with PL/SQL, XML and SOAP.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Well versed working with Relational Database Management Systems as Oracle 9i/12c, MS SQL, MySQL Server
- Hands on experience in working on XML suite of technologies like XML, XSL, XSLT, DTD, XML Schema, SAX, DOM, JAXB.
- Hands on experience in advanced Big-Data technologies like Spark Ecosystem (Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics
- Knowledge of the software Development Life Cycle (SDLC), Agile and Waterfall Methodologies.
- Experienced on applications using Java, python and UNIX shell scripting
- Experience in consuming Web services with Apache Axis using JAX-RS(REST) API's.
- Experienced in building tool Maven, ANT and logging tool Log4J.
- Experience in working with Web Servers like Apache Tomcat and Application Servers like IBM Web Sphere and JBOSS.
- Good knowledge of NoSQL databases such as HBase, MongoDB and Cassandra.
- Experience in working with Eclipse IDE, Net Beans, and Rational Application Developer.
- Experience in using PL/SQL to write Stored Procedures, Functions and Triggers.
- In depth understanding/knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, NameNode, DataNode.
- Hands on experience on Hadoop /Big Data related technology experience in Storage, Querying, Processing and analysis of data.
- Excellent interpersonal and communication skills, creative, research-minded, technically competent, result oriented with problem-solving as well maintaining the leadership skills and ability to work well with people and to maintain a good relationship with the organization.
TECHNICAL SKILLS:
Operating System: Linux, UNIX, IOS, TinyOS, Sun Solaris, HP-UX, Windows 8, Windows 7, UNIX, Linux, CentOS, Ubuntu.
Hadoop/Big Data: HDFS, MapReduce,MRUnit, YARN, Hive, Pig, HBase, Impala, Zookeeper, Sqoop, Oozie, Apache Cassandra, scala, Flume, Spark, Apache ignite, Avro, AWS.
Languages: HQL, R, Python, XPath, Spark, PL/SQL, Pig Latin.
Data Warehousing& BI: Informatica Power Center 9x/8x/7x, Power Exchange, IDQ
ETL Tools: IBM InfosphereDatastage 11.5, MSBI (SSIS)
Database: SQL Server,Oracle, and MYSQL, HBase, Mongo DB, Cassandra.
Debugging tools: Microsoft SQL Server Management Studio 2008, Business Intelligence Development Studio 2008, RAD, Subversion, BMS Remedy
Version Controller: SVN, GIT, CVS, Tpump, Mload, Fast Export.
Web Technologies: ASP.NET,ASP,HTML,DHTML,CSS,WebServices,XML,Java,HTML,CSS,JSP,Web Services, XML, JavaScript.
GUI Editors: Eclipse Maven, ANT, JUnit, TestNG, Jenkins, Soap UI, Putty, Log4j
Others: MS Office, Adobe Photoshop, DOM manipulations, Responsive Web Design, Karma, Grunt, Jasmine, Rally, Confluence, Jira.
Tools: Eclipse, Tableau, GIT, SVN, Concurrent versions system (CVS),RAD, Subversion, BMS Remedy, PUTTY, WinSCP, FileZilla, Service Now
PROFESSIONAL EXPERIENCE:
Confidential, Fort Laudedale, FL
Big Data/Hadoop Developer
Responsibilities:
- Evaluated business requirements and prepared detailed specifications that follow project guidelines required to develop written programs.
- Analyzed Hadoop cluster and different Big Data including HBase and Cassandra .
- Created RDD's and applied data filters in Spark and created Cassandra tables and Hive tables for user access.
- Involved in Hadoop cluster administration and successful in maintenance of large volumes of storage.
- Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
- Worked with cloud provisioning team on a capacity planning and sizing of the nodes (Master and Slave) for an AWS EMR Cluster .
- Worked with Amazon EMR to process data directly in S3 when we want to copy data from S3 to the Hadoop Distributed File System ( HDFS ) on your Amazon EMR cluster by setting up the Spark Core for analysis work.
- Worked on Apache Solr which is used as indexing and search engine.
- Used Amazon Kinesis to run or streaming data real-time on AWS .
- Worked on both kind of data processing as batch and streaming with ingestion to NoSQL and HDFS with different file format such as parquet and AVRO .
- Involved on configuration, development of Hadoop environment with AWS cloud such as EC2, EMR , Redshift , Cloud watch , and Route .
- Responsible for coding MapReduce program, Hive queries, testing and debugging the MapReduce programs.
- Extracted Real time feed using Spark streaming and convert it to RDD and process data into Data Frame and load the data into Cassandra .
- Involved in the process of data acquisition, data pre-processing and data exploration of telecommunication project in Scala .
- Worked on Cloudera distribution and deployed on AWS EC2 Instances.
- Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and Zookeeper.
- Implemented MapReduce programs to handle semi/unstructured data like XML , JSON , Avro data files and sequence files for log files.
- Developed various Python scripts to find vulnerabilities with SQL Queries by doing SQL injection, permission checks and performance analysis.
- Experienced in converting Hive/SQL queries into Spark transformations using Spark RDD, Scala and Python.
- Imported weblogs & unstructured data using the Apache Flume and stores the data in Flume channel.
- Exported event weblogs to HDFS by creating a HDFS sink which directly deposits the weblogs in HDFS .
- Used ElasticSearch as a distributed RESTful web services with MVC for parsing and processing XML data.
- Utilized XML and XSL Transformation for dynamic web-content and database connectivity.
- Responsible to manage data coming from different sources and application Supported MapReduce Programs those are running on the cluster
- Involved in loading data from UNIX file system to HDFS .
- Involved in designing schema, writing CQL's and loading data using Cassandra .
- Built the automated build and deployment framework using Jenkins , Maven etc.
- Imported and exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
- Support Cloud Strategy team to integrate analytical capabilities into an overall cloud architecture and business case development.
Environment : HBase, Hive, Hadoop, MapReduce, Spark, Cassandra, Kafka, Zookeeper, Xml, JSon, Python, Unix, Jenkins, Maven
Confidential
Big Data Hadoop Developer
Responsibilities:
- Gathered the business requirements from the Business Partners Responsible to manage data coming from warehouse
- Created all external tables as raw data zone on top of hive before data comes to bigdata layer from warehouse
- We get the data in the form of dimensions and fact tables. All dimension tables will be updated monthly wise. We applied STARSCHMEA concept to build this project work.
- Responsible in created hive scripts and shellscript for base data load and as well as delta load.
- As the data is coming from warehouse we designed pre-processing step as slimming down the dimensions for each dimension and fact tables for loss and premium for all the dashboards.
- Designed and developed STARSCHEMA Good knowledge in identifying the dimensions with Facts.
- Hands on experience in developing Staging Layer from the slimming down the dimensions. Staging Layer does not have any keys and it consists of redundant data.
- Involved in creation Datamart dimensions and fact table creation from staging Layer I moved entire Datamart from QA to PROD
- Validated all the datamarts as per the developer standpoint Interacted with Click view Team while generating reports
- Hands on experience in synchronizing of the tables from hive and Big SQL Good knowledge in working with Big SQL
- Experience in delivering data marts to click view, and users can access the data from Big SQL Responsible for maintaining full refresh data for every quarter.
- Involved in HDFS maintenance and loading of structured data. Involved in managing and reviewing Hadoop log files
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis Written Hive queries for data analysis to meet the business requirements
- Creating Hive tables and working on them using HiveQL. Utilized AgileScrum Methodology to help manage and organize a team of 4 developers with regular code review sessions
- Code repository is maintained as BOARLANDSTARTEAM, from which all the code can be moved to the prod environment once the UAT and QA success
- Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting
- Working knowledge in writing PIG'sLoad and Store functions Focusing on optimization, we used Spark and Scala in order to achieve high throughput.
- Whatever we processed using HIVE is taking 10hrs. to complete one dashboard, Later the same thing was completed within 30Min using Spark/Scala.
- Trained the offshore Resources to adopting the client standards
Environment: IBM Big Insights 4.2, Spark 1.6, Scala, Big SQL, HIVE, Java 1.8, Schell script, Pig, and Borland StarTeam 14.0.
Confidential, Madison, WI
Hadoop Developer
Responsibilities:
- Data Ingestion into the Indie-Data Lake using Open source Hadoop distribution to process Structured, Semi-Structured and Unstructured datasets using Open source Apache tools like FLUME and SQOOP into HIVE environment. (Using IBM Big Insights Ver-4.1 platform).
- Develop Spark code using Scala and Spark-SQL for faster testing and data processing.
- Experience with batch processing of data sources using Apache Spark.
- Develop predictive analytics using Apache Spark Scala APIs.
- Developed MapReduce jobs in Java API to parse the raw data and store the refined data.
- Develop Kafka producer and consumers, Hbase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Imported millions of structured data from relational databases using Sqoop import to process using Spark and stored the data into HDFS in parquet format.
- Implemented Spark Data Frames transformations, actions to migrate Map reduce algorithms.
- Exploring with the Spark for improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frame, Pair RDD's, Spark YARN.
- Used Data Frame Developed solutions to pre-process large sets of structured, with different file formats (Text file, Avro data files, Sequence files, Xml and JSON files, ORC and Parquet).me API in Java for converting the distributed collection of data organized into named columns.
- Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
- Worked on Database designing, Stored Procedures, and PL/SQL.
- Involved in identifying job dependencies to design workflow for Oozie & YARN resource management.
- Responsible for managing existing data extraction jobs, but also play a vital role in building new data pipelines from various structured and unstructured sources into Hadoop.
- Work on a product team using Agile Scrum methodology to design, develop, deploy and support solutions that leverage the Client big data platform.
- Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
- Design and code from specifications, analyzes, evaluates, tests, debugs, and implements complex software apps.
- Developed Sqoop Scripts to extract data from DB2 EDW source databases into HDFS.
- Worked in tuning Hive & Pig to improve performance and solved performance issues in both scripts with understanding of Joins, Group and aggregation and how does it translate to Map Reduce jobs
- VCreated Partitions,Buckets based on State to further process using Bucket based Hive joins.
- Implemented Cloudera Manager on existing cluster. .
- Extensively worked with Cloudera Distribution Hadoop, CDH 5.x, CDH4.x
- Designed & developed various SSIS packages (ETL ) to extract & transform data & involved in Scheduling SSIS Packages.
- Created ETL packages with different data sources ( SQL Server, Flat Files, Excel source files, XML files etc.) and then loaded the data into destination tables by performing complex transformations using SSIS/DTS packages.
- Troubleshooting experience in debugging and fixed the wrong data or data missing problem for both Oracle Database and Mongo DB.
Environment:: HDFS, MapReduce, JavaAPI, JSP, JavaBean, Pig, Hive, Sqoop, Flume, Oozie, HBase, Kafka, Impala, Spark Streaming, Storm, Yarn, Eclipse, Spring, PL/SQL, Unix Shell Scripting, Cloudera.
Confidential, New York, NY
Hadoop Developer
Responsibilities:
- Importing and exporting data into HDFS and Hive using Sqoop and Kafka.
- We were getting on an average of 60 GB on daily basis. Overall the data warehouse for my project was having 4 PB of data and we used 110 node cluster to process the data
- Develop different components of system like Hadoop process that involves Map Reduce, and Hive.
- Developed interface for validating incoming data into HDFS before kicking off Hadoop process.
- Written hive queries using optimized ways like user-defined functions, customizing Hadoop shuffle &sort parameters.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based Data pipeline. This pipeline is also involved in Amazon Web Services EMR, S3 and RDS.
- Worked on tuning Hive and Pig to improve performance and solve performance related issues in Hive and Pig scripts with good understanding of Joins, Group and aggregation and how it does Map Reduce jobs.
- Developing map reduce programs for different types of Files using Combiners with UDF's and UDAF's.
- Experience working on multiple node cluster tool which offer several commands to return HBase usage.
- Experience in creating tables, dropping and altered at run time without blocking updates and queries using HBase and Hive.
- Using HCATALOG to access Hive table metadata from Map Reduce or Pig code.
- Experience on pre-processing the logs and semi structured content stored on HDFS using PIG.
- Experience in structured data imports and exports into Hive warehouse which enables business analysts to write Hive queries.
- Experience in managing and reviewing Hadoop log files.
- Experience on UNIX shell scripts for business process and loading data from different interfaces to HDFS.
- Responsible for developing data pipeline using flume, sqoop and pig to extract the data from weblogs and store in HDFS Designed and implemented various metrics that can statistically signify the success of the experiment.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Responsible for processing ingested raw data using MapReduce, Apache Pig and Hive.
- Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
- Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
- Involved in creating Hive tables, Pig tables, and loading data and writing hive queries and pig scripts.
- Had a couple of workshops on Spark, RDD & spark-streaming.
- Hands on experience in eclipse, Putty, winSCP,VNCviewer, etc.
Environment:: Linux 6.7, CDH5.5.2, MapReduce, Hive 1.1, PIG, HBase, Yarn, Hive, Pig, HBase, Oozie, Shell Script, AWS SQOOP 1.4.3, Eclipse, Java 1.8.
Confidential
Java Developer
Responsibilities:
- Developing front-end screens using JSP, HTML and CSS.
- Developing server side code using Struts and Servlets.
- Developing core java classes for exceptions, utility classes, business delegate, and test cases.
- Developing SQL queries using MySQL and established connectivity.
- Working with Eclipse using Maven plugin for Eclipse IDE.
- Designing the user interface of the application using HTML5, CSS3, JSP, and JavaScript.
- Tested the application functionality with JUnit Test Cases.
- Developing all the User Interfaces using JSP framework and Client Side validations using JavaScript.
- Writing Client Side validations using JavaScript.
- Extensively used JQuery for developing interactive web pages.
- Developed the user interface presentation screens using HTML, XML, and CSS.
- Developed the Shell scripts to trigger the Java Batchjob, Sending summary email for the batch job status.
- Co-ordinate with the QA lead for development of test plan, test cases, test code and actual testing responsible for defects allocation and those defects are resolved.
- Application was developed in Eclipse IDE and was deployed on Tomcat server.
- Involved in Agile scrum methodology.
- Supported for bug fixes and functionality change.
Environment:: Java/J2EE, Oracle 10g, SQL, PL/SQL, JSP, Hibernate, WebLogic 8.0, HTML, AJAX, Java Script, JDBC, XML, UML, JUnit, Eclipse.
Confidential
Java Associate
Responsibilities:
- Actively participated in all phases of the Software Development Life Cycle SDLC.
- Extensively worked on CORE JAVA (Collections of Generics and Templates, Interfaces for passing data from GUI Layer to Business Layer)
- Developed web interface for user's modules using JSP, HTML, XML, CSS, Java script, AJAX.
- Developed using J2EE design patterns like Command Pattern, Session Facade, Business Delegate, Service Locator, Data Access Object andvalue object patterns.
- Used J-Unit test cases to test the application and performed random checks to analysis the portability, reliability, and flexibility of the project.
- Analyzed, designed and implemented Online Enrollment Web Application using Struts, JSTL, Hibernate, UML, Design Patterns and Log4J.
- Using advanced level JQUERY, AJAX, JavaScript, CSS andpure CSS layouts and database using JDBC for ORACLE.
- Involved in writing application level code to interact with APIs, Web Services using AJAX, JSON and XML.
- Created Servlets and Java Server Pages, which route submittals to the appropriate Enterprise Java Bean EJB.
- Development process the SCRUM, Iterative Agile methodologies for web application.
- Responsible for the performance PL/ SQL procedures and SQL queries.
- Involved in deployment components on Weblogic application server.
- Deployed applications on Linux client machines.
Environment:: Java EE 5, JSP 2.0, Java Bean, EJB3.0, JDBC, Application Server, Eclipse, Java API, J2SDK 1.4.2, JDK 1.5, JDBC, JMS, Message queues, Web services, UML, XML, HTML, XHTML, JavaScript, log4j, CVS, Junit, Windows and Sun OS 2.7/2.8.
