Bigdata Developer Resume
Warren, NJ
PROFESSIONAL SUMMARY:
- Around 8 years of experience in IT industry with 3+ years of experience in Big Data implementing complete Hadoop solutions along with Java.
- Good working experience in using Apache Hadoop eco system components like MapReduce, HDFS, Hive, Sqoop, Pig, Oozie, Flume, HBase and Zoo Keeper.
- Writing UDFs and integrating with Hive and Pig.
- Experience with Sequence files, AVRO and ORC file formats and compression.
- Experience in Hadoop Distributions: Cloudera and Hortonworks,
- Performed importing and exporting data into HDFS and Hive using Sqoop.
- Experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper.
- Extensive knowledge in using SQL Queries for backend database analysis.
- Strong knowledge in NOSQL column oriented databases like HBase, Cassandra and its integration with Hadoop cluster.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems (RDBMS) and vice - versa.
- Led many Data Analysis & Integration efforts involving HADOOP along with ETL.
- Hands on experience on Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
- Extensive experience with SQL, PL/SQL and database concepts.
- Transferred bulk data from RDBMS systems like Teradata into HDFS using Sqoop.
- Experience in analyzing data using Hive QL, Pig Latin, and custom MapReduce programs in Java.
- Well-versed in Agile, other SDLC methodologies and can coordinate with owners and SMEs.
- Worked on different operating systems like UNIX, Linux, and Windows
- Diverse experience utilizing Java tools in business, Web, and client-server environments including Java Platform, Enterprise Edition (Java EE), Enterprise Java Bean (EJB), JavaServer Pages (JSP), Java Servlets (including JNDI), Struts, and Java database Connectivity (JDBC) technologies.
- Fluid understanding of multiple programming languages, including C#, C, C++, JavaScript, HTML, and XML.
- Experience in web application design using open source MVC , Spring and Struts Frameworks.
TECHNICAL SKILLS:
Hadoop Core Services: HDFS, Map Reduce, Spark, YARN
Hadoop Distribution: Horton works, Cloudera, Apache
NO SQL Databases: HBase, Cassandra
Hadoop Data Services: Hive, Pig, Sqoop, Flume
Hadoop Operational Services: Zookeeper, Oozie
Monitoring Tools: Cloudera Manager
Cloud Computing Tools: Amazon AWS
Languages: C, Java/J2EE, Python, SQL, PL/SQL, Pig Latin, HiveQL, Unix Shell Scripting
Java & J2EE Technologies: Core Java, Servlets, Hibernate, Spring, Struts
Application Servers: Web Logic, Web Sphere, JBoss, Tomcat
Databases: Oracle, MySQL, Postgress, Teradata
Operating Systems: UNIX, Windows, LINUX
Build Tools: Jenkins, Maven, ANT
Development Tools: Microsoft SQL Studio, Toad, Eclipse, NetBeans
Development methodologies: Agile/Scrum
Visualization and analytics tool: Tableau Software, Qlik View
PROFESSIONAL EXPERIENCE:
Confidential, Warren, NJ
Bigdata Developer
Responsibilities:
- Involved in complete Big Data flow of the application data ingestion from upstream to HDFS, processing the data in HDFS and analyzing the data using several tools.
- Imported the data from various formats like JSON, Sequential, Text, CSV, AVRO and Parquet to HDFS cluster with compressed for optimization.
- Experienced on ingesting data from RDBMS sources like - Oracle, SQL Server and Teradata into HDFS using Sqoop.
- Configured Hive and written Hive UDF's and UDAF's Also, created partitions such as Static and Dynamic with bucketing.
- Managing and scheduling Jobs on a Hadoop cluster.
- Created Hive External tables and loaded the data in to tables and query data using HQL.
- Written Hive jobs to parse the logs and structure them in tabular format to facilitate effective querying on the log data.
- Developed PIG scripts for the analysis of semi structured data and Latin scripts to extract the data from the web server output files to load into HDFS.
- Developed PIG UDF'S for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Developed Oozie workflow for scheduling and orchestrating the ETL process and worked on Oozie workflow engine for job scheduling.
- Managed and reviewed the Hadoop log files using Shell scripts.
- Migrated ETL jobs to Pig scripts to do Transformations, even joins and some pre-aggregations before storing the data onto HDFS.
- Using Hive join queries to join multiple tables of a source system and load them to Elastic search tables.
- Experience in managing and reviewing huge Hadoop log files.
- Collected the logs data from web servers and integrated in to HDFS using Flume.
- Expertise in designing and creating various analytical reports and Automated Dashboards to help users to identify critical KPIs and facilitate strategic planning in the organization.
- Involved in Cluster maintenance, Cluster Monitoring and Troubleshooting
- Created Data Pipelines as per the business requirements and scheduled it using Oozie Coordinators.
- Maintaining technical documentation for each and every step of development environment and launching Hadoop clusters.
- Worked on different file formats like Sequence files, XML files and Map files using MapReduce Programs.
- Worked with Avro Data Serialization system to work with JSON data formats.
- Used Amazon Web Services S3 to store large amount of data in identical/similar repository.
- Worked with the Data Science team to gather requirements for various data mining projects.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
- Involved in build applications using Maven and integrated with Continuous Integration servers like Jenkins to build jobs.
- Used Enterprise Data Warehouse database to store the information and to make it access all over organization.
- Worked on BI tools as Tableau to create dashboards like weekly, monthly, daily reports using tableau desktop and publish them to HDFS cluster.
Environment: Hadoop, HDFS, Hive, Oozie, Pig, Sqoop, Shell Scripting, HBase, Jenkins, Tableau, Oracle, MySQL, Teradata and AWS.
Confidential, Florham Park, NJ
Big Data Developer
Responsibilities:
- Responsible for building scalable distributed data solutions using Hadoop
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.
- Written multiple MapReduce programs to power data for extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV & other compressed file formats.
- Reviewed the HDFS usage and system design for future scalability and fault-tolerance.
- Involved in loading data from LINUX file system to HDFS.
- Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML and JSON.
- Defined job flows and developed simple to complex Map Reduce jobs as per the requirement.
- Optimized MapReduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Developed PIG UDFs for manipulating the data according to Business Requirements and also worked on developing custom PIG Loaders.
- Hands on experience in setting up HBase Column based storage repository for archiving and retro data.
- Responsible for creating Hive tables based on business requirements.
- Used Enterprise data lake to provide support for various uses cases including Analytics, processing, storing and Reporting of voluminous, rapidly changing, structured and unstructured data.
- Along with the Infrastructure team, involved in design and developed Kafka and Storm based data pipeline.
- Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Involved in data modeling and sharding and replication strategies in Cassandra.
- Load the data into Spark RDD and do in memory data Computation to generate the Output response.
- Knowledge on handling Hive queries using Spark SQL that integrate Spark environment.
- Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Apache Hadoop 2x, Cloudera, HDFS, MapReduce, Hortonworks, Hive, Pig, HBase, Spark, Scala, Sqoop, Kafka, FLUME, Cassandra, Oracle 11g/10g, Linux, XML,MYSQL.
Confidential
Hadoop Developer
Responsibilities:
- Understanding business needs, analyzing functional specifications and map those to develop and designing MapReduce programs and algorithms.
- Optimizing Hadoop MapReduce code, Hive and Pig scripts for better scalability, reliability and performance.
- Developed the OOZIE workflows for the Application execution.
- Performing data migration from Legacy Databases RDBMS to HDFS using Sqoop.
- Writing Pig scripts for data processing.
- Created HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- Implemented Hive tables and HQL Queries for the reports.
- Imported data from Cassandra into HDFS using Mongo export utility.
- Involved in developing shell scripts and automated data management from end to end integration work
- Experience in performing data validation using HIVE dynamic partitioning and bucketing.
- Written and used complex data type in storing and retrieved data using HQL in Hive.
- Developed Hive queries to analyze reducer output data.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Highly involved in designing the next generation data architecture for the Unstructured data.
- Developed PIG Latin scripts to extract data from source system.
- Created and maintained technical documentation for executing Hive queries and Pig scripts.
- Involved in Extracting, loading Data from Hive to Load an RDBMS using Sqoop
Environment: HDFS, Map Reduce, MySQL, Cassandra, Hive, HBase, Oozie, PIG, ETL, Hortonworks(HDP 2.0), Shell Scripting, Linux, Sqoop, Flume and Oracle 11g.
Confidential
Hadoop Developer
Responsibilities:
- Involved in importing data from Microsoft SQLServer, MySQL, Teradata. into HDFS using Sqoop.
- Developed workflow in Oozie to automate the tasks of loading the data into HDFS.
- Used Hive to analyze the partitioned and bucked data to compute various metrices of reporting.
- Involved in creating Hive tables loading data, and writing queries that will run internally in MapReduce
- Involved in creating Hive External tables for HDFS data.
- Solved performance issues in Hive and Pig Scripts with understanding of Joins, Group and Aggregation and perform the MapReduce jobs.
- Used Spark for transformations, event joins and some aggregations before storing the data into HDFS.
- Troubleshoot and resolve data quality issues and maintain elevated level of data accuracy in the data being reported.
- Analyze the large amount of data sets to determine optimal way to aggregate.
- Worked on the Oozie workflow to run multiple Hive and Pig jobs.
- Involved in creating Hive UDF's.
- Developed Automated shell script to execute Hive Queries.
- Involved in processing ingested raw data using Apache Pig.
- Monitored continuously and managed the Hadoop cluster using cloudera manager.
- Worked on different file formats like JSON, AVRO, ORC, Parquet and Compression like Snappy, zlib, ls4 etc.
- Executed HiveQL in Spark using SparkSQL.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, scala.
- Gained Knowledge in creating Tableau dashboard for reporting analyzed data.
- Expertise with NoSQL databases like HBase.
- Experienced in managing and reviewing the Hadoop log files.
- Used GitHub as repository for committing code and retrieving it and Jenkins for continuous integration.
Environment: HDFS, MapReduce, Sqoop, Hive, Pig, Oozie, MySQL, Eclipse, Git, GitHub, Jenkins.
Confidential
Java Developer
Responsibilities:
- Involved in various stages of Enhancements in the Application by doing the required analysis, development, and testing.
- Prepared the High and Low level design document and Generating Digital Signature
- For analysis and design of application created Use Cases, Class and Sequence Diagrams.
- For the registration and validation of the enrolling customer developed logic and code.
- Developed web-based user interfaces using struts frame work.
- Handled Client side Validations used JavaScript
- Wrote SQL queries, stored procedures and enhanced performance by running explain plans.
- Involved in integration of various Struts actions in the framework.
- Used Validation Framework for Server side Validations
- Created test cases for the Unit and Integration testing.
- Front-end was integrated with Oracle database using JDBC API through JDBC-ODBC Bridge driver at server side.
- Designed project related documents using MS Visio which includes Use case, Class and Sequence diagrams.
- Writing end-to-end flow i.e. controllers classes, service classes, DAOs classes as per the Spring MVC design and writing business logics using core java API and data structures
- Used Spring JMS related MDB to receive the messages from other team with IBM MQ for queuing
- Developed presentation layer code, using JSP, HTML, AJAX and JQuery
- Developed the Business layer using spring (IOC, AOP), DTO, and JTA
- Developed application service components and configured beans using Spring IOC. Implemented persistence layer and Configured EH Cache to load the static tables into secondary storage area.
- Involved in the development of the User Interfaces using HTML, JSP, JS, Dojo Tool Kit, CSS and AJAX
- Created tables, triggers, stored procedures, SQL queries, joins, integrity constraints and views for multiple databases, Oracle 11g using Toad tool.
- Developed the project using industry standard design patterns like Singleton, Business Delegate Factory Pattern for better maintenance of code and re-usability
- Developed unit test cases using Junit framework for testing accuracy of code and logging with SLF4j + Log4j
- Worked with defect tracking system Clear Quest
- Worked with IDE as Spring STS and deployed into spring tomcat server, WebSphere& used Maven as build tool
- Responsible for code sanity in integration stream used Clear Case as version control tool
Environment: Java, J2EE, Spring, Spring Batch, Spring JMS, MyBatis HTML, CSS, AJAX, JQuery, JavaScript, JSP, XML, UML, JUNIT, IBM WebSphere, Maven, Clear Case, SoapUI, Oracle 11g,, IBM MQ.
