Sr. Hadoop/big Data Developer Resume
Charlotte, NC
SUMMARY
- Over 8+ years of IT experience in Java/J2EE development and in Hadoop Ecosystem including MapReduce, Pig, Hive, HBase, Sqoop, Flume and ApacheSpark with Scala.
- Extensive experience in using PIG on writing Embedded PIGscripts to provide RESTful Web Services using spring for analysis.
- Proficiency in NoSQL databases such as HBase, MongoDB.
- Experienced for analyzingbigdataand provide technical recommendations to improve current existing systems.
- Expertise in designing and developing applications using Java/J2EE Technologies including Servlets, EJB, JSP, JDBC&JMS.
- Experienced in developing Java, J2EE applications using Struts, spring, and Hibernate.
- Experienced in writing database objects like StoredProcedures, Triggers, SQL, PL/SQL packages and Cursors for Oracle, SQLServer, DB2 and Sybase.
- Experienced on major components in Hadoop Ecosystem including Hive, Sqoop, Flume&knowledge of MapReduce/HDFSFramework.
- Excellent knowledge ofHadoopArchitecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node,DataNode and MapReduce concepts
- Experience using various Hadoop Distributions (Cloudera, HortonWorks, MapRetc) to fully implement and leverage new Hadoop features
- Experience in working with MapReduce programs using ApacheHadoopfor working withBigData
- Extensive experience Pig Latin scripts using operators such as LOAD, STORE, DUMP, FILTER, DISTINCT, FOREACH, GENERATE, GROUP, COGROUP, ORDER, LIMIT, UNION, SPLIT to extract data fromdata files to load into HDFS.
- Excellent experience in UNIX shells scripting and concepts like OOPS, OOAD, Data structures and Design Patterns.
- Hands on experience working on Cassandra withHadoopcluster.
- Experience inHadoopDistributions like Cloudera, HortonWorks, BigInsights, MapRWindows Azure, and Impala.
TECHNICAL SKILLS
Hadoop/Big Data Technologies: HDFS, Map Reduce, Sqoop, Flume, Pig, Hive, Oozie, impala, ZookeeperandCloudera Manager, MongoDB, NO SQL Database HBase
Languages: Core Java, Advanced Java, Java, SQL, PL/SQL, WSDL.
Front End Technologies: HTML5, JavaScript, CSS 3, AJAX
Web Technologies: Servlets, JSP, JSTL, JDBC
Monitoring and Reporting: Tableau, Custom shell scripts, Hadoop Distribution Horton Works, Cloudera, MapR
Unit Test Frameworks: JUnit, Mockito
Design Methodologies: Rational Rose, MS Visio&StarUML
Application Servers: JBoss 7.1, WebSphere 6.x, WebLogic 11g, JBoss 5.0
Databases: ORACLE 9i/10g/11g/12c, MySQL and SQL Server 2008
IDEs: Eclipse, Netbeans, RAD, Jdeveloper, TOAD, SQL Developer
Testing Tools: JUnit 4.x, EasyMock3.x
SCM Tools: Mercurial, Github, Subversion, CVS, Perforce, Clearcase
Operating Systems: Linux, UNIX, Windows NT/XP/2000, MAC OS X
Build Tools: Maven 3.x, ANT 1.x
PROFESSIONAL EXPERIENCE
Confidential, Charlotte, NC
Sr. Hadoop/Big Data Developer
Responsibilities:
- Responsible to managedatacoming from different sources and involved in HDFS maintenance and loading of structured and unstructureddata.
- Importing and exportingdatainto RDBMS and Hive using Sqoop.
- Done partitioning ofHive table, creating an external table and differences between the managed and external tables and optimized HIVE analytics SQLqueries and achieve job performance.
- Created and worked Sqoop jobs with incremental load to populate Hive External tables.
- Developed Pigscripts in the areas where extensive coding needs to be reduced.
- Created HBase tables to store variabledataformats ofdatacoming from different portfolios.
- Analyzed the web log data using the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Developed backend (server side) in Scala.
- Design technical solution for real - time analytics using Kafka and HBase.
- Solved performance issues in Hive and Pigscripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Designing conceptual model with Spark for performance optimization.
- Developed Oozieworkflow for scheduling and orchestrating the ETLprocess
- Developed MapReduce programs to parse the rawdataand store the refineddatain tables.
- Developed syllabus/Curriculum data pipelines from Syllabus/Curriculum Web Services to HBASE and Hive tables.
- Analyzingdatawith Hive, Pig andHadoopStreaming
- Worked on creating theDataModel for Cassandra from the current OracleDatamodel.
- Worked with CQL to execute queries on thedatapersisting in the Cassandra cluster.
- To analyzedatamigrated to HDFS, used Hivedatawarehouse tool and developed Hive queries.
- Worked on installing and configuring EC2 instances on AmazonWebServices (AWS) for establishing clusters on cloud.
- Worked collaboratively with all levels of business stakeholders to architect, implement and test BigData based analytical solution from disparate sources.
- Used Tableau for visualizing and to generate reports.
- Used Flume to collect, aggregate, and store the logdatafrom different web servers.
- Worked onBigDataIntegration and Analytics based onHadoop, SOLR, Spark, Kafka, Storm and web Methods technologies.
- Implemented Spark using Scala and SparkSQL for faster testing and processing ofdata.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala
- Exported the analyzed data to the RDBMS using Sqoop for to generate reports for the BI team.
- Developed customized UDF's in Java to extend Hive and Pig Latin functionality while querying and processing of Data.
Environment: JAVA, Map Reduce, Hadoop, SQOOP, OOZIE, YARN, Informatica, Kafka, Spark, Scala, Sparksql, HDFS, HBASE, HIVE, PIG, Eclipse, Centos, XML, JSON, SQL and Oracle 12c.
Confidential, Columbus, OH
Sr. Hadoop/Big Data Developer
Responsibilities:
- Developed optimal strategies for distributing the web logdataover the cluster importing and exporting the stored web logdatainto HDFS and Hive using Sqoop.
- Worked with highly unstructured and semi structureddataof 2 Petabytes in size.
- Created Hive Tables, loaded transactionaldatafrom Teradata using Sqoop.
- Developed MapReduce (YARN) jobs for cleaning, accessing and validating thedata.
- WritingPigScripts to transform rawdatafrom severaldatasources into forming baselinedata.
- Use ApacheScoop to dump the user incrementaldatainto the HDFS on a daily basis.
- Involved in Agilemethodologies, daily scrum meetings, spring planning.
- Implemented HiveGenericUDF's to in corporate business logic into Hive Queries.
- Analyzed the web logdatausing the HiveQL to extract number of unique visitors per day, page views, visit duration, most visited page on website.
- Integrated Oozie with the rest of theHadoopstack supporting several types ofHadoopjobs out of the box (such as Map-Reduce, Pig, Hive, and Sqoop) as well as system specific jobs (such as Java programs and shell scripts).
- Creating Hive tables and working on them using HiveQL.
- Designed and Implemented Partitioning (Static, Dynamic) Buckets in HIVE.
- Involved in End-to-End implementation of ETLlogic.
- Worked with HiveQL onbigdataof logs to perform a trend analysis of user behavior on various online modules.
- Creating Hive tables using partitioning and bucketing techniques and injectsdatafrom HDFS&Oracle, MSSQLServer to Netezza& Vice versa.
- Generate final reportingdatausing Tableau for testing by connecting to the corresponding Hive tables using HiveODBC connector.
- Involved in migrationHadoopjobs into higher environments like SIT, UAT and Prod.
- Used Apache to monitorHadoopcluster and track memory usage in the cluster.
- Involved in creating Database SQL and PL/SQL queries and stored Procedures.
- Monitored workload, job performance and capacity planning using Cloudera Manager.
- Exported the analyzeddatato the RDBMS using Sqoop for to generate reports for the BI team.
- Worked collaboratively with all levels of business stakeholders to architect, implement and testBigDatabased analytical solution from disparate sources.
- Involved in Agile methodologies, daily scrum meetings, spring planning.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Hbase, Sqoop, Oozie, Maven, Shell Scripting, CDH.
Confidential, Belluvae,WA
Sr. Hadoop/Big Data Developer
Responsibilities:
- Helped business processes by developing, installing and configuring Hadoop ecosystem components that moveddatafrom individual servers to HDFS.
- Installed and configured MapReduce, HIVE and the HDFS; implemented CDH4Hadoopcluster on Centos. Assisted with performance tuning and monitoring.
- Created HBase tables to load large sets of structured, semi-structured and unstructureddatacoming from UNIX, NoSQL and a variety of portfolios.
- Worked with ApacheHadoop, Spark and Scala.
- Supported code/design analysis, strategy development and project planning.
- Created reports for the BI team using Sqoop to exportdatainto HDFS and Hive.
- Developed multiple MapReduce jobs in Java fordatacleaning and preprocessing.
- Assisted withdatacapacity planning and node forecasting.
- Collaborated with the infrastructure, network, database, application and BI teams to ensuredataquality and availability.
- Load and transform large sets of structured, semi structured and unstructureddata.
- Developeddatapipeline using Flume, Sqoop, Pig and Javamapreduce to ingest customer behavioraldataand financial histories into HDFS for analysis.
- Handled importing ofdatafrom variousdatasources, performed transformations using Hive, MapReduce, loadeddatainto HDFS and Extracted thedatafrom MSSQL2008, Oracle11g into HDFS using Sqoop
- Developed workflow in Oozie to automate the tasks of loading thedatainto HDFS and pre-processing with Pig.
- Involved in developing Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
- Involved in emitting processeddatafromHadoopto relational databases or external file systems using SQOOP, HDFSGET or CopyToLocal.
- Involved in writing the hive queries for analysis on the structureddatain the output folder of HDFS
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Installed Oozieworkflow engine to run multiple Hive and Pig jobs.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Optimized the mappings using various optimization techniques and also debugged some existing mappings.
- Developed HiveQL scripts to manipulate the data in HDFS.
- Worked closely with the ETLLead, Solution Architect,DataModeler, Business Analysts to understand business requirements, providing expert knowledge and solutions onDataWarehousing, ensuring delivery of business needs in a timely cost-effective manner.
- Extensively used PL/SQL Developer for creating database objects, running the command scripts for inserting the configurationdataitems.
- Used UNIXShellscripts to deploy the Oracle forms and reports to production servers.
Environment: Java (JDK 1.6), Eclipse, Linux, CDH4.x, Sqoop, Pig, Hive, Oozie, UNIX Shell Scripting, HUE, WinScp, MYSQL, Hadoop, PL/SQL, SQL*PLUS, HBase, Linux, MapReduce, HDFS
Confidential, Minneapolis, MN
Sr. Java/Hadoop Developer
Responsibilities:
- Maintained System integrity of all sub-components (primarily HDFS, MR, HBase, and Hive).
- Migrated the needdatafrom MySQL into HDFS using Sqoop and importing various formats of flat files into HDFS.
- Responsible in troubleshooting of errors in Glassfish servers, bouncing Glassfish and haproxy service.
- Implemented GUI screens for viewing using Servlets, JSP, Tag Libraries, JSTL, JavaBeans, HTML, JavaScript and Strutsframework using MVC design pattern.
- Load thedatainto HBase tables for UI web application.
- Developed clickable prototypes in HTML, DHTML, Photoshop, CSS and JavaScript.
- Used CVS for source code version control.
- Written customized HiveUDFs in Java where the functionality is too complex.
- Designed and created Hive external tables using shared meta-store instead of derby with partitioning, dynamic partitioning and buckets and used HiveQL scripts to create, load, and query tables in a Hive.
- Supported MapReduce Programs those are running on the cluster
- Developed Graphical User Interfaces using HTML, XML/XSLT and JSP's for user interaction and CSS for styling.
- Development of SOAP (JAX-WS) web service applications using contract last approach.
- Extensively developed stored procedures, triggers, functions and packages in oracle SQL, PL/SQL.
- Developed JavaBeans for the Forms and Action classes for Struts framework.
- Performed advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark using Scala.
- Run various Hive queries on thedatadumps and generate aggregated datasets for downstream systems for further analysis.
- Lead & Programmed the recommendation logic for various clustering and classification algorithms using JAVA.
Environment: HortonWorksHadoop, HDFS, Hive, HQL scripts, Scala, Map Reduce, Storm, Java, HBase, Pig, Sqoop, Shell Scripts, Oozie, Hadoop, MapReduce, HDFS, Hive, Java, SQL, Cloudera Manager, Scala, Cassandra
Confidential
Java Developer
Responsibilities:
- Involved in Various Stages of Software Development Life Cycle (SDLC) deliverables of the project using the AGILE Software development methodology.
- Involved in Daily Scrum meetings, Sprint planning and estimation of the tasks for the user stories, participated in retrospective and presenting Demo at end of the sprint.
- Gathered and clarified requirements with business analyst to feed into high-level
- Responsible for front-end development and client-side validations using JQUERY, JavaScript, Bootstrap.
- Developed highly productive dynamic web applications by Groovy on Grails.
- Interacting with client in design and code review meetings.
- Developing Intranet Web Application using J2EE architecture, using JSP to design the user interfaces and Hibernate for database connectivity.
- Designed the Architecture of the project as per SpringMVC Frame work
- Worked with SpringCore, SpringAOP, Spring Integration Framework with JDBC.
- Experience in implementing SpringMVC and business workflows using Spring-webflow
- Used Oracle 10g Database, SQL to perform data mapping and backend testing. Also documented all the SQL queries for future testing purpose.
- Modified and accessed database using SQL, PL/SQL, Stored Procedures, Triggers, Views.
- Provided support for all SOA Web logic environments (Oracle SOA/BPM &OSB Suite 11g)
- Used Log4j, Junitfor logging and Testing.
- Build this application using Groovy/Grails as a REST API server and BackboneJS as a single application.
- Creating XML based configuration, property files for application and developing parsers using JAXP, SAX, and DOM technologies.
- Involved in understanding the functionality and process flow.
Environment: JDK 1.5, J2EE, Struts, Servlets, spring, Hibernate, AJAX, HTML, CSS, XML, Ant, JavaScript, Oracle 10G, RAD7.5, VSS, WebSphere 5.x, Log4J, Rational Rose, JUnit, SQL, PLSQL, SOAP