Big Data Developer Resume
Rochester, NY
SUMMARY
- A technology driven professional wif over 8years of career reflecting strong leadership qualifications wif hands - on development and extensive experience in Big Data domain wif tools like Hadoop, Hive, and other open source tools/technologies in Banking, Healthcare, Insurance.
- Experienced in the Hadoop ecosystem components like Hadoop Map Reduce, Cloudera, Hortonworks, HBase, Oozie, Hive, Sqoop, Pig, Flume, and Cassandra.
- Experience in developing solutions to analyze large data sets efficiently
- In depth understanding/noledge of Hadoop Architecture and various components such as HDFS, JobTracker, TaskTracker, NameNode, DataNode and MapReduce concepts
- Extensive hands on experience in writing complex Mapreduce jobs, Pig Scripts and Hive data modeling. Expertise in implementing Labeler and Standardizer transformation in IDQ for cleansing data.
- 1+ years of comprehensive experience in Tableau (Dev & Admin) and Big Data/Hadoop which includes MR (Mapreduce), PIG, HIVE, Impala, Oozie, Flume, Zookeeper, NoSQL DB's such as Hbase and exposure to MarkLogic.
- Excellent understanding/noledge of Hadoop Distributed system architecture and design principles.
- Experience in converting MapReduce applications to Spark.
- Good working experience using Sqoop to import data into HDFS from RDBMS and vice-versa
- Good noledge in using job scheduling and workflow designing tools like Oozie.
- Experience in working wif BI team and transform big data requirements into Hadoop centric technologies.
- Experience in performance tuning the Hadoop cluster by gathering and analyzing the existing infrastructure.
- Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
- Extending Hive and Pig core functionality by writing customUDFs
- Good understanding of Data Mining and Machine Learning techniques
- Experience in handling messaging services using Apache Kafka.
- Experience in fine-tuning Mapreduce jobs for better scalability and performance.
- Developed various Map Reduce applications to perform ETL workloads on terabytes of data.
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Experienced in developing and implementing web applications using Java, J2EE, JSP, Servlets, JSF, HTML, DHTML, EJB, JavaScript, AJAX, JSON, JQuery, CSS, XML, JDBC and JNDI.
- Experience in writing SQL, PL/SQL queries, Stored Procedures for accessing and managing databases such as SQL Server2014/2012 MySQL, and IBM DB2.
- Working experience in Development, Production and QA Environments.
- Involved in all phases of Software Development Life Cycle (SDLC) in large scale enterprise software using Object Oriented Analysis and Design.
- Working experience of control version tools like SVN, CVS, Clear Case and PVCS.
TECHNICAL SKILLS
Languages: Java (Core Java, Networking, Threads, Swing), XML, XSD, XSL, JavaScript, Scala
J2EE Technologies: J2EE, Java Mail API
Web servers: Apache Tomcat Server, IBM Websphere Application Server5.0/6.0, Weblogic application server,JBOSS4.x
Server Side: JSP, Servlets, EJB, JDBC.
Frameworks/ Components: Spring, Spring Batch, Struts, Hibernate.
Big Data: Hadoop HDFS, Map Reduce.
Databases: SQL, MySQL, SQL Server, Oracle, DB2, MarkLogic MicrosoftAccess
Unit Testing: Junit, Rational
Methodologies: OOAD, RUP, UML, Design Patterns.
OS: Windows 2000/XP/Vista, WindowsNT4.0, Windows 03, Linux,UNIX.
Markup Languages: HTML, XML, DHTML
Open Source API: Apache Commons-io/file upload/net.
PROFESSIONAL EXPERIENCE
Confidential, Rochester, NY
Big Data Developer
Responsibilities:
- Implemented multiple Map Reduce Jobs in java for data cleansing and pre-processing.
- Worked wif the team to increase cluster from 28 nodes to 42 nodes, the configuration for additional data nodes was done by Commissioning process in Hadoop.
- Creating Spark SQL queries for faster requests.Designed and developed IDQ solutions for data profiling and cleansing in Analyst tool.Worked on creating profiles, profile models, rules and mappings wif IDQ.
- Responsible for implementing Hadoop 2.0 (YARN) and testing pig, hive and mango db job processes.Experience on creating profile, rules and mappings wif IDQ.
- Development, documentation, and implementation support on MIDAS project (HHS client), using Hadoop (Java API), and Mark logic server to transact or store large data (BIG data), it also connects wif other systems to download data (Daily/Weekly/Monthly basis) and save it as a sequence file. Map/Reduce algorithm (Java API) used to perform faster processing of large scale data.
- Used Impala for aggregating jobs and runs an average of 100K aggregation queries per day.
- Used Impala for optimization of query performance instead of Hive. Worked on IDQ Analyst for Profiling, Creating rules on Profiling and Scorecards.
- Administered and supported distribution of Hortonworks.Designed IDQ mappings which is used as Mapplets in Power center.
- Managed and scheduled Jobs on a Hadoop cluster. Created Profiles using IDQ rules and filters.
- Involved in defining job flows, managing and reviewing log files. Developed unit and system test cases, using System Procedures to check data consistency wif adherence to the data model defined. Check the data quality using IDQ.
- Provided the best recommendations and industry best practices for ETL Solutions in SSIS.
- Installed Oozie workflow engine to run multiple Map Reduce, Hive HQL and Pig jobs.
- Collected the log data from web servers and integrated into HDFS using Flume. Involved in massive data profiling using IDQ (Analyst tool) prior to data staging.
- Worked wif Hortonworks Data Platform. Subscribed data using Invenio application which is published on Data Router to NFS and HDFS.
- Worked on reading multiple data formats on HDFS using Scala.
- Responsible for implementing Hadoop 2.0 (YARN) and testing pig, hive and mango db job processes.
- Cassandra developer: Set-up configured and optimized the Cassandra cluster. Developed real-time java based application to work along wif the Cassandra database.
- Developed SSIS transformations & Event Handlers for Error handling and Debugging for the Packages
- Developed analytical components using Scala, Spark, Apache Mesos and Spark Stream.
- Responsible to manage data coming from different sources.
- Data processing using SPARK.
- Expertise in NoSQL databases like MarkLogic, Cassandra, MongoDB, HBase.
- Created and maintained Technical documentation for launching HADOOP Clusters and for executing Hive queries and Pig Scripts.
- Worked on Hadoop (Hortonworks) ecosystem architecture, systems-level, and network-level noledge
- Participated in requirement gathering form the Experts and Business Partners and converting the requirements into technical specifications
- Constructed System components and developed server side part using Java, EJB, and Spring Frame work. Involved in designing the data model for the system.
- Leveraging and using the following database software back-ends: MySQL, MS
- SQL Server, Oracle, sqlite, and Accumulo
- Used J2EE design patterns like DAO, MODEL, Service Locator, MVC and Business Delegate.
- Defined Interface Mapping between JDBC Layer and Oracle Stored Procedures.
- Experience in managing and reviewing Hadoop log files.
- Tested out various other Mesos frameworks such as Kafka and tested isolation.
- Mentored other developers in building web services to run on elastic infrastructure such as Mesos
- Worked on NoSQL databases including Cassandra, MongoDB, MarkLogic, and HBase. Managing and reviewing Hadoop log files, worked wif HCatalog to open up access to Hive's Metastore.
- Created final tables in Parquet format in Impala.
- Used Impala to read, write and query the Hadoop data in HDFS or H
- Installed Oozie workflow engine to run multiple Hive and pig jobs.
- Implement Flume, Spark, Spark Stream framework for real time data processing.
- Analyzed large amounts of data sets to determine optimal way to aggregate and report on it.
- Analyzed the SQL scripts and designed the solution to implement using Scala.
- Supported in setting up QA environment and updating configurations for implementing scripts wif Pig and Sqoop. Worked on tuning the performance Pig queries.
- Implemented a script to transmit sysprin information from Oracle to Hbase using Sqoop.
- Implemented best income logic using Pig scripts and UDFs.
- Implemented test scripts to support test driven development and continuous integration.
Environment: Hadoop, Map Reduce,Impala, Spark, Scala, shark, Kafka, HDFS, Zoo Keeper, Hive, Pig, Oozie, Core Java, Eclipse, Hbase, Sqoop, MarkLogic, Flume, Accumulo, Oracle 10g, Cassandra, SQL.
Confidential, Melville, NY
Hadoop Developer
Responsibilities:
- Responsible to manage data coming from different sources and involved in HDFS maintenance and loading of structured and unstructured data.
- Developed data pipeline using Flume, Sqoop, Pig and Java MapReduce to ingest behavioral data into HDFS for analysis.
- Responsible for importing log files from various sources into HDFS using Flume.
- Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
- Developed software to process, cleanse, and report on vehicle data utilizing various analytics and REST API languages like Java, Scala.
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Created customized BI tool for manager team dat perform Query analytics using HiveQL.
- Scheduled data pipelines for automation of data ingestion in AWS.
- Extensively worked on Microsoft tools for documentation and presentations- Visio, Word, PowerPoint and Excel Macros.
- Created Partitions, Buckets based on State to further process using Bucket based Hive joins.
- Estimated the hardware requirements for NameNode and DataNodes & planning the cluster.
- Created reports for the BI team using Sqoop to export data into HDFS and Hive.
- Created Hive Generic UDF's, UDAF's, UDTF's in python to process business logic dat varies based on policy.
- Worked on migrating MapReduce programs into Spark transformations using Spark and Scala.
- Moved Relational Database data using Sqoop into Hive Dynamic partition tables using staging tables.
- Extensively used SSIS transformations such as Lookup, Derived column, Data conversion, Aggregate, Conditional split, SQL task, Script task and Send Mail task etc.
- Experience in installation, configuration, supporting and monitoring Hadoop clusters using Apache, Cloudera distributions and AWS.
- Optimizing the Hive queries using Partitioning and Bucketing techniques, for controlling the data distribution.
- Built and maintained scalable data pipelines using the Hadoop ecosystem and other open source components like Hive, and HBase.
- Demonstrated expertise utilizing ETL tools, including SQL Server Integration Services (SSIS), and Informatica and ETL package design, and RDBMS systems like SQL Server, Oracle.
- Designed MarkLogic solutions to provide highly available, fully scalable high speed ingestion of published oriented data. Solution allowed for both transactional processing and warehouse reporting against the same system.
- Worked wif Kafka for the proof of concept for carrying out log processing on a distributed system. Worked wif NoSQL database Hbase to create tables and store data.
- Worked on custom Pig Loaders and storage classes to work wif variety of data formats such as JSON and XML file formats.
- Created Pig Macros to improve reusability of code and modularizing the code.
- Involved in Cassandra Data Modeling and Analysis and CQL(Cassandra Query Language).
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Scala.
- Experience in Upgrading Apache Ambari, CDH and HDP Cluster.
- Configured and Maintained different topologies in Storm cluster and deployed them on regular basis.
- Experienced wif different kind of compression techniques like LZO, GZip, and Snappy.
- Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java map-reduce Hive, Pig, and Sqoop.
- Expertise in NoSQL databases like MarkLogic, Cassandra, MongoDB, HBase.
- Created Data Pipeline of Map Reduce programs using Chained Mappers.
- Implemented Optimized join base by joining different data sets to get top claims based on state using Map Reduce.
- Prepared the complete data mapping for all the migrated jobs using SSIS.
- Created HBase tables to load large sets of structured, semi-structured and unstructured datacoming from UNIX, NoSQL and a variety of portfolios. Involved in loading data from Linux file system to HDFS.
- Used Marklogic Hadoop connector to import and export data from Marklogic database to HDFS.
- Utilized AWS framework for content storage and Elastic Search for document search.
- Implemented map reduce programs to perform joins on the Map side using Distributed Cache in Java. Developed Unit test cases using Junit, Easy Mock and MRUnit testing frameworks.
- Using Macros for automated custom coloring of cells based on values in the cell.
- Experience in Upgrading hadoop cluster hbase/zookeeper from CDH3 to CDH4.
- Create a complete processing engine, based on Cloudera's distribution, enhanced toperformance.
- Experienced in Monitoring Cluster using Cloudera manager.
Environment: Hadoop, HDFS, HBase,Impala, MapReduce, Java, Scala, JDK 1.5, J2EE 1.4, Informatica, Struts 1.3, Hive, Pig, Macros, Marklogic, Sqoop, Flume, Kafka, Oozie, Hue, Hortonworks, Scala, Storm, Zookeeper, AVRO Files, AWS, SQL, ETL, Cloudera Manager, MySQL, MongoDB.
Confidential
Sr.Java/J2EE Developer
Responsibilities:
- As a programmer, involved in designing and implementation of MVC pattern.
- Extensively used XML where in process details are stored in the database and used the stored XML whenever needed.
- Part of core team to develop process engine.
- Developed Action Classes & Validation Struts framework
- Created project related documentations like user guides based on role.
- Implemented modules like Client Management, Vendor Management.
- Attended various Client meetings.
- Implemented Access Control Mechanism to provide various access levels to the user.
- Designed and developed the application using J2EE, JSP, XML, Struts, Hibernate, Spring technologies
- Coded DAO and hibernate implementation Class for data access.
- Coded Springs Services Class and Transfer Objects to pass the data between layers.
- Designed the Database for the Jeevica in MS-SQL server 2008
- Implemented Web Services using Axis
- Used different features of Struts like MVC, Validation framework and tag library.
- Created detail design document, Use cases, and Class Diagrams using UML
- Written ANT scripts to build JAR, WAR and EAR files.
- Developed Standalone Java Component dat will interact wif Crystal Reports on Crystal Enterprise Server in order to view Reports as well Scheduling of Reports as well storing data as XML and sending data to consumers using SOAP.
- Deployed the application and tested on Websphere Application Servers.
- Developed Java Scripts for client side validations in JSP.
- Developed JSPs wif Struts taglibs for the presentation layer.
- Coordinated wif the onsite, offshore and QA team to facilitate the quality delivery from offshore on schedule.
Environment: Java 1.5, Spring, Spring WebService, JSP, JavaScript, Hibernate, SOAP, CSS, Struts, Websphere, MQ Series, JUnit, Apache, Windows XP and Linux
Confidential
Java Developer
Responsibilities:
- Designed a system and developed a framework using J2EE technologies based on MVC architecture.
- Involved in the iterative/incremental development of project application. Participated in the requirement analysis and design meetings.
- Used Apache flume to ingest log data from multiple sources directly into Accumulo, file roll and HDFS
- Designed and Developed UI’s using JSP by following MVC architecture
- Designed and developed Presentation Tier using Struts framework, JSP, Servlets, TagLibs, HTML and JavaScript.
- Designed the control which includes Class Diagrams and Sequence Diagrams using VISIO.
- Used the STRUTS framework in application. Programmed the views using JSP pages wif the struts tag library, Model is a combination of EJB’s and Java classes and web implementation controllers are Servlets.
- Generated XML pages wif templates using XSL. Used JSP and Servlets, EJBs on server side.
- Developed a complete External build process and maintained using ANT.
- Implemented Home Interface, Remote Interface, and Bean Implementation class.
- Implemented business logic at server side using Session Bean.
- Extensive usage of XML - Application configuration, Navigation, Task based configuration.
- Designed and developed Unit and integration test cases using Junit.
- Used EJB features TEMPeffectively- Local interfaces to improve the performance, Abstract persistence schema, CMRs.
- Used Struts web application framework implementation to build the presentation tier.
- Wrote PL/SQLqueries to access data from Oracle database.
- Set up Web sphere Application server and used Ant tool to build the application and deploy the application in Web sphere.
- Prepared test plans and writing test cases
- Implemented JMS for making asynchronous requests
Environment: Java, J2EE, Struts, Hibernate, Accumulo, JSP, HDFS, Servlets, HTML, CSS, UML, JQuery, Log4J, XML Schema, JUNIT, Tomcat, JavaScript, Oracle 9i, Unix, Eclipse IDE.
