Big Data/ Hadoop Developer Resume
Prussia, PA
PROFESSIONAL SUMMARY
- Over 8 years of professional IT experience and proven expertise in complete SDLC life cycle comprising of 4 years hands on experience in Hadoop Eco system technologies and 4+ years of Web - based and Client-Server business applications with emphasis on Object Oriented Java / J2EE technologies.
- Extensive noledge and experience in Hadoop Architecture, and its Eco-system technologies to build high-volume real-time distributed data processing, streaming and batch applications.
- Experienced in building scalable and distributed data processing applications on HDFS, using Hive, Map Reduce, Spark, Sqoop, flume, Pig, Oozie, Kafka, and Apache Nifi.
- Hands on experience with migrating data from different Relational Database Systems to HDFS using Hive, Sqoop and vice versa.
- Experienced in writing Map reduce and spark streaming jobs to process text data, Used Apache Kafka for data ingestion, data flow integration process with Apache NiFi in HDF.
- Involved in complete flow of the application, starting from data ingestion from upstream to HDFS, processing the data in HDFS, Extract, Transform, Load (ETL / ELT)and analysis of data.
- Experience in converting and processing different file formats such as Text, Avro, Sequence, ORC, parquet, JSON, CSV.
- Worked on developing Enterprise applications using Java and various J2EE technologies includingHibernate 3.x/4.0, Spring MVC, JSP, JavaScript, JQuery, XML, HTML, and SOA, Web services like SOAP, REST, UML, and Design/Architectural patterns. Experience in application development and deployment using application servers as Apache Tomcat6.0/7.0 and WebLogic.
- Extensive experience in all phases of SDLC and Agile Methodology.
- Experience working with Hortonworks (HDP, HDF) and Cloudera.
- Worked on Distributed cluster setup and configuration, Data Analysis.
- Experience working with different databases, such as Oracle, MySQL, SQL Server and NoSQL databases. Writing stored procedures, functions, joins, and triggers.
- Performed unit testing using JUnit, Integration test cases. Build and deployment of applications done using tools like ANT and Maven and debugging, logging through log4j.
- Experienced in developing architectural, network diagrams and UMLdiagramslike use case, class and sequence diagrams usingMS Visio.
- Proactive, highly motivated, organized and result oriented with excellent interpersonal skills. Strong verbal and written communication skills. Self-motivating and be able to fit in a team with diverse professionals.
TECHNICAL SKILLS
Big Data Eco system: Apache Hadoop 1.0/2.x, HDFS, YARN, MapReduce, Hive 1.2, Sqoop, Impala, Flume1.4/1.5, Spark 2.1, Spark SQL, Spark Streaming, Big Data, Cloudera (CDH4 and CDH5), Hortonworks HDP(2.5, 2.6), Apache Kafka, Apache NiFi, Zookeeper, Oozie, Zeppelin, AVRO, ORC, Parquet, XML, CSV.
Databases: SQL, SQL server 2008/2012, Oracle 11g/10g, MySQL, DB2, HBase, MongoDB.
Programming Languages: Java, Groovy/ Grails, Scala, Python, PL/SQL.
Frameworks: MVC, Struts, Hibernate 3.x/4.2, Spring 2.5/3.2, Spring-Boot.
Java/J2EE and Web Technologies: Java 6/7, JMS, ActiveMQ, Grails 2.3, Web Services, WSDL, SOAP, REST, XML, XSLT, JAXB, JSON, JSP, Java Script, HTML, CSS, AngularJS, Ajax, JQuery, Jetty, Mule ESB 3.3,JUnit, log4j.
Web/Application servers: Apache Tomcat 6.0/7.0, WebLogic, JBoss.
Version control: SVN, CVS, GIT.
Tools and IDE: Eclipse, IntelliJ, Putty, Maven, ANT, Jenkins, Jira, SQL Developer, Informatica, Talend.
Operating Systems: UNIX, RedHat Linux, Ubuntu Linux, CentOS and Windows.
PROFESSIONAL EXPERIENCE
Confidential Prussia, PA
Big Data/ Hadoop Developer
Responsibilities:
- Responsible for designing, developing, testing, tuning and the global deployment of software solutions within Hadoop Eco system.
- Design and develop program logic for new applications or analyze and modify logic in existing applications.
- Extract, Translate, Load, and present disparate datasets in multiple formats and multiple sources including JSON, Avro, text files, Hortonworks Data Flow (HDF, NifI), and log data.
- Ingested data from various global locations manufacturing site’s machine data, SAP-ecc data, lab information data into Hadoop centralized Data lake, and also into Hive.
- Worked on implementing Kafka custom encoders for custom input format to load data into custom partitions.
- Developed ETL scripts based on technical specifications, and imported data into HDFS using Sqoop.
- Experience in writing spark jobs with RDD’s, spark context, transformations and actions, data frames for data transformations from relational sets.
- Used Scala shell commands to develop spark scripts.
- Wrote custom Avro2CSVProcessor to convert Avro files to CSV formats using Java to use in Hortonworks data flow with Nifi.
- Loaded and transformed large sets of structured, semi structured data using Pig scripts.
- Hands on experience in using Hortonworks Ambari for cluster configuration, updates, and upgrades.
- Used Apache Kafka and Spark streaming to build real time pipeline for streaming data.
- Code, test, debug, document, implement and maintain software applications. Analyze requirements, test and integrate application components. Ensures that system improvements and upgrades are successful.
- Work with Data Scientist to analyze potential uses cases to translate into design specifications.
- Reviewing HDFS usage and system design for future scalability and fault tolerance.
- Manufacturing data capture, trend reporting, and analytics, with cross site data exchange.
Environment: Apache Hadoop, Hortonworks(HDP 2.6), HDFS 2.7, YARN, Hive 1.2, Sqoop 1.4, Pig, Tez, Flume, Zookeeper 3.4, Oozie 4.2, MapReduce, Hiveserver2(beeline), Kafka 0.10, Spark 1.6, Apache Nifi(HDF), Zeppelin, Spark SQL/ Streaming, Java 8, Scala, Python, SQL server 2012, Oracle 11g, XML, JSON, REST, GIT, Qlikview, Shell scripting, Redhat 6.7, Putty, IntelliJ, Jira, Maven.
Confidential, Duluth, GA
Big Data/ Hadoop Developer
Responsibilities:
- Worked in an agile development environment, evaluated business requirements and prepare the business requirements and design documents.
- Participate in Design reviews and daily project scrums.
- Interacted with business clients and data source teams gathering business process and technical details about how data was generated.
- Imported data from various data sources, performed transformations using Hive, Pig and loaded data into HDFS for aggregations.
- Ingested the raw data into HDFS in batch mode using Sqoop and SFTP with FS Shell. Setup and configured Flume for real time data ingestion.
- Performed Hive partitioning, bucketing and executing different types of joins on Hive tables and implementing Hive servers like JSON and Avro.
- Experience with NoSQL databases like Hbase, and MongoDB.
- Worked on full cycle from requirement gathering, development, testing and Support for vendor Integration Project to process ANSI EDI 850, 855, 856, 810, 997 files.
- Written Sqoop incremental import job to move new / updated info from Database to HDFS.
- Developed MapReduce jobs for log analysis Analytics to generate reports on failovers.
- Created and Consumed a web service messages using SOAP WSDL. Also provided Security to it using X.509 signature verification, Autantication done using security cert.
- Used HTML, CSS, JavaScript, Ajax, GSP for web pages and Groovy for controllers.
- Created HBase tables to store variable data formats coming from different portfolios performed real time analytics on HBase using Java API and REST API.
- Used Apache poi, iText libraries to generate complex formats of various forms.
- Used Apache Spark to execute Scala source code for JSON data processing.
- Written Python scripts to parse XML documents and load data into database.
- Created visual trends and calculations in Tableau on customers and products data.
- Extensive Business noledge related to Healthcare industry and its terms HIPAA, ICD, ANSI EDI, DME, TT, PT, etc. Ancillary services with Insurance claims process.
- Wrote unit and integration test cases.
Environment: Apache Hadoop, MapReduce, Java, HDFS, HIVE, Sqoop, Spark, Spring, Hibernate, Apache Tomcat, JavaScript, XML, XSLT, Web services, SOAP, REST, ActiveMQ, Mule ESB, Tableau, Talend, Eclipse, Putty, GIT, Jenkins, Maven, Grails/Groovy, Windows, Linux.
Confidential, Columbus, OH
Java-Hadoop Developer
Responsibilities:
- Developing multiple MapReduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
- Developed Java programs to process huge JSON files received from marketing team to convert into format standardized for the application.
- Proof of concept on spark for integration source transformations.
- Gained good experience in Amazon Web Services (EC2, S3, EMR) and Imported data (csv, plain text) from Amazon S3 using HDFS commands
- Developed SQL stored procedures and prepared statements for updating and accessing data.
- Used Spring AOP to implement logging and getting data source objects as the advice that was woven in the bean classes.
- Applied design patterns including MVC Pattern, Facade Pattern, Abstract Factory Pattern, DAO Pattern and Singleton.
- Applied optimizations on Hive tables for faster querying as well as performance tuning techniques on MapReduce and Pig jobs to improve running time
- Used Oozie workflow engine to manage interdependentHadoopjobs and to automate several types ofHadoopjobs such as map-reduce Hive, Pig, and Sqoop.
- Performed various optimizations like using distributed cache for small datasets, partition and bucketing in hive, doing map side joins etc.
- Generated various marketing reports using Tableau and Spotfire.
- Conduct/Participate in project team meetings to gather status, discuss issues and action items.
- Communicate deliverables status to user/stakeholders, client and drive periodic review meetings.
Environment: Java, J2EE, Hadoop, Hortonworks, Spark, MapReduce, Hive, Sqoop, Impala, Microservices, Spring, Web services, REST, SOAP, Shell scripting, SVN, Linux, Putty, Oracle 11g, Tableau.
Confidential, Duluth, GA
Software Developer
Responsibilities:
- Used Flume to import log data from web server into HDFS.
- Worked with large scale distributed data solution Cloudera CDH4 cluster.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Creation of Design Documents, System Test Cases, Unit Test Cases and review document, Migration Documents.
- Monitored Map Reduce Programs which are running on the cluster.
- Worked on Git for version control, JIRA for project tracking and Jenkins for continuous Integration.
- Involved in development, design and implementation front-end part of the widget-based application using HTML, CSS, JQuery, JavaScript.
- Used Ajax, JSON with JQuery for request data and response processing.
- Experience in application performance tuning and troubleshooting.
- Developed Presentation tier as HTML, Java Server Pages using Struts MVC Framework.
Environment: Apache Hadoop, Cloudera, HDFS, HBase, MongoDB, Hive, MapReduce, Oozie, JSON, SQL, AngularJS, XML, Java, Spring, web services, REST, HTML, CSS.
Confidential, Dallas, TX
Java/ J2EE Developer
Responsibilities:
- Involved in all the phases of SDLC including Requirements Collection, Design & Analysis of the Customer Specifications, and Development & Customization of the Application.
- Prepared Use case, Class and Sequence diagrams using Rational Rose tool.
- Developed the application under J2EE architecture using JSP, Struts, Java Beans, iBATIS Data mapper.
- Developed many JSP pages, used Dojo in JavaScript Library, jQuery UI for client side validation. Client & server validation were handled using Struts validator.
- Developed UI using JSP, Controller using Struts Action Class, Model using Java Beans as POJO.
- Implemented Java Message Services (JMS) using JMS API.
- Performed Unit testing, System Testing and Integration Testing.
- Prepared the system test plan and test cases to suit business requirements and system specification documents.
- Used Maven to compile and generate EAR, WAR & JAR. Used Log4j for logging Errors.
- Used SVN for version control & source code management.
Environment: JDK 1.5, J2EE 1.4, JSP, Struts 1.3, Struts Tiles, validator, EJB 2.0 (Session, MDB, JMS), Hibernate 3.3, XML, UML, Oracle, BEA WebLogic Server 9.1, Eclipse 3.2, Ajax, Ant, JUnit, Log4j, Maven 1.9, CVS, Rational Rose, JavaScript, Red Hat Linux.
Confidential, Monroe, LA
Java/ J2EE Developer
Responsibilities:
- Involved in translating functional requirements into technical requirements.
- Experience in document analysis and technical feasibility discussions for implementing new functionalities.
- Applied design patterns including MVC Pattern, Facade Pattern, Abstract Factory Pattern, DAO Pattern and Singleton.
- Developed front end screens using JSP, Struts View Tags, XLTS, DHTML, HTML5, CSS3, JavaScript, and spring.
- Used Exception Handling, STRUTS Validator Framework. And with Strong noledge in Spring Framework using IOC/AOP, Spring Transaction support (Declarative/ Programmatic), Hibernate.
- Involved in server side and front-end validation using Struts Validation framework and JavaScript.
- Developed various Database interaction objects by implementing the DAO patterns.
- Generated Spring XML files for the configured beans.
- Used Hibernate for mapping POJO’s to relational database tables using xml files
- Used SAX Parser for parsing the xml document
- Involved in Unit Testing and Bug-Fixing and achieved the maximum code coverage using JUNIT test cases.
Environment: Core Java, Java 1.5, JSP, HTML, JavaScript, Struts 1.2, Hibernate 3.0, Spring 2.0, JSF, JMS, ANT, AJAX, Design Patterns, Servlets, Struts Tag Libraries/JSTL, XML, UML, JUnit, Oracle 10g, SVN, Web Services, Agile, Log4J, CSS, Windows XP.
Confidential
Software Engineer
Responsibilities:
- Involved with interacting with the clients and application user for their Requirements, Specifications and enhancements.
- Involved in design and development of Servlets and JSPs using Apache Struts framework.
- Used JDBC, Data Sources and Connection Pooling in Application server.
- Implemented J2EE Design Patterns such as Session Facade to reduce the Network Traffic and Service Locator.
- Designed and developed a user usage logging facility using Apache Log4J.
- Implemented Complete client side validations in JavaScript.
- Used ANT to write build scripts as well as deployment scripts.
- Packed and deployed the entire application code to integration testing environment for all the releases.
- Involved in JUnit tests for the services and documented the services developed.
- Provided production support by interacting with the end-users and fixing bugs.
Environment: Java, J2EE, Struts 1.1, LINUX, JSP/Servlets, CSS, WebLogic, Eclipse 3.0, JDBC, XML, HTML, Oracle 9i, UML, JUnit, SVN, ANT 1.3/1.4, SOAP, Web Services.
