Sr. Big Data Developer Resume
Hartford, CT
SUMMARY
- 10+ years of IT experience in Analysis, Design, Development, Implementation, Integration and testing of Application Software in web - based environments, distributed n-tier products and Client/Server architectures.
- 5+ years of Big Data/Hadoop experience using Hadoop stack (MapReduce Programming, Pig, Hive, Oozie and Sqoop, Flume, Spark/Spark SQL, Strom, Kafka, Yarn/MRv2) and NoSQL technologies such as Cassandra, Hbase and MongoDB.
- Good knowledge of Hadoop Architecture and various components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, Resource manager and YARN concepts.
- Sophisticated experience with Hadoop distributions like Apache, IBM Big Insights, Cloudera, Hortonworks & MapR.
- Expert in importing data using Sqoop into HDFS from various Relational Database Systems.
- Worked on standards and proof of concept in support of CDH4 and CDH5 implementation using AWS cloud infrastructure.
- Hands on experience in implementing complex business logic and optimizing the query using Hive QL and controlling the data distribution by partitioning and bucketing techniques to enhance performance.
- Good knowledge in Architecting, Designing, re-Engineering and Performance Optimization.
- Experience with Oozie Workflow Engine in running workflow jobs with actions that run Hadoop Map Reduce and Pig jobs.
- Used Talend Open Studio for Big Data Integration and ETL operations.
- Very Good Knowledge in Logical and Physical Data modeling, creating new data models, data flows and data dictionaries.
- Experience in understanding the security requirements for Hadoop and integrate with Kerberos authentication and authorization infrastructure.
- Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
- Hands-on experience in using relational databases like Oracle, MySQL and MS-SQL Server.
- Experienced the integration of various data sources like Java, RDBMS, Shell Scripting, Spreadsheets, and Text files.
- Worked and learned a great deal from Amazon Web Services (AWS) Cloud services like EC2, S3, EBS, RDS and VPC.
- Ability to spin up different AWS instances including EC2-classic and EC2-VPC using cloud formation templates.
- Experience in Web Services using XML, HTML and SOAP.
- Excellent Java development skills using Java 6/7/8, J2EE, J2SE, Servlets, Junit, JSP, JDBC.
- Experience using middleware architecture using Sun Java technologies like J2EE, JSP, Servlets, and application servers like Web Sphere and Web logic.
- Familiarity working with popular frameworks likes Struts, Hibernate, Spring MVC and AJAX.
- Exceptional ability to learn and master new technologies and to deliver outputs in short deadlines.
- Quick Learner, with high degree of passion and commitment in work.
- Proficiency with mentoring and on-boarding new engineers who are not proficient in Hadoop and getting them up to speed quickly.
TECHNICAL SKILLS
Big Data Technology: HDFS, Mapreduce, HBase, Pig, Hive, SOLR, Sqoop, Flume, MongoDB, Cassandra, Puppet, Oozie, Zookeeper, Spark, Kafka, Talend
Java/J2EE Technology: JSP, JSF, Servlets, EJB, JDBC, Struts, Spring, Spring MVC, Spring Portlet, Spring Web Flow, Hibernate, iBATIS, JMS, MQ, JCA, JNDI, Java Beans, JAX-RPC, JAX-WS, RMI, RMI-IIOP, EAD4J, Axis, Castor, SOAP, WSDL, UDDI, JiBX, JAXB, DOM, SAX, MyFaces(Tomahawk), Facelets, JPA, Portal, Portlet, JSR 168/286, LifeRay, WebLogic Portal, LDAP, JUnit.NET
Hadoop Distribution: Cloudera, Hortonworks, IBM Big Insights
Cloud Computing Service: AWS (Amazon Web Services)
Languages: Java (5/6/7/8), C/C++, Swing, SQL, HTML, CSS, i18n, l10n, DHTML, XML, XSD, XHTML, XSL, XSLT, XPath, XQuery, SQL, PL/SQL, UML, JavaScript, AJAX(DWR), jQuery, Dojo, ExtJS, Shell Scripts, Perl
Development Framework/IDE: RAD 8.x/7.x/6.0, IBM WebSphere Integration Developer 6.1, WSAD 5.x, Eclipse Galileo/Europa/3.x/2.x, MyEclipse 3.x/2.x, NetBeans 7.x/6.x, IntelliJ 7.x, Workshop 8.1/6.1, Adobe Photoshop, Adobe Dreamweaver, Adobe Flash, Ant, Maven, Rational Rose, RSA, MS Visio, OpenMake Meister
Web/Application Servers: WebSphere Application Server 8.x/ 7.0/6.1/5.1/5.0 , WebSphere Portal Server 7.0/6.1, WebSphere Process Server 6.1, WebLogic Application Server 8.1/6.1, JBoss 5.x/3.x, Apache 2.x, Tomcat 7.x/6.x/5.x/4.x, MS IIS, IBM HTTP Server
Databases: NoSQL, Oracle 11g/10g/9i/8i, DB2 9.x/8.x, MS SQL Server 2008/2005/2000 , MySQL
NoSQL: HBase, Cassandra, MongoDB
Reporting Tools: Tableau, Datameer
Operating Systems: Windows XP, 2K, MS-DOS, Linux (Red Hat), Unix (Solaris), HP UX, IBM AIX
Version Control: CVS, SourceSafe, ClearCase, Subversion
Monitoring Tools: Embarcadero J Optimizer 2009, TPTP, IBM Heap Analyzer, Wily Introscope, Jmeter
Other: JBoss Drools 4.x, REST, IBM Lotus WCM, MS ISA,CA SiteMinder, BMC WAM, Mingle
PROFESSIONAL EXPERIENCE
Confidential - Hartford, CT
Sr. Big Data Developer
Responsibilities:
- Act as a liaison for visualizers, business and product teams for their specific data and reporting needs.
- Manage Agile Software Practice using Rally by creating Product Backlogs, Iterations and Sprints in collaboration with business and product teams.
- Working with cloud services like Amazon Web Services (AWS) and involved in ETL, Data Integration and Migration.
- Import data using Sqoop to load data from MySQL to HDFS on regular basis
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis
- Collect and aggregate large amounts of web log data from different sources such as webservers, mobile and network devices using Apache Kafka and stored the data into HDFS for analysis
- Develop multiple Kafka Producers and Consumers from scratch implementing organization's requirements
- Responsible for creating, modifying topics (Kafka Queues) as and when required with varying configurations involving replication factors, partitions and TTL
- Write and test complex MapReduce jobs for aggregating identified and validated data
- Create Managed and External Hive tables with static/dynamic partitioning
- Extensively involved in performance tuning of the HiveQL by performing bucketing on large Hive tables
- Implement Spark applications usingScalaand Spark SQL for faster testing and processing of data
- Develop an equivalent SparkScalacode for existing SAS code to extract summary insights on the Hive tables
- Extensive experience in writing Pig scripts to transform raw data from several data sources into forming baseline data
- Implement workflow using Oozie for running Map Reduce jobs and Hive Queries
- Expert in implementing advanced procedures like text analytics and processing using the in-memory computing capabilities like Apache Spark written in Scala.
- Develop and execute shell scripts to automate the jobs.
- Write complex Hive queries and UDFs.
- Involved in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
- Develop multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
- Analyze the SQL scripts and designed the solution to implement using PySpark.
- Involved in loading data from UNIX file system to HDFS.
- Extract the data from Teradata into HDFS using Sqoop.
- Handle importing of data from various data sources, performed transformations using Hive, Map Reduce, Spark and loaded data into HDFS.
- Involved in analysis, design, testing phases and responsible for documenting technical specifications.
- Develop Kafka producer and consumers, HBase clients, Spark and Hadoop MapReduce jobs along with components on HDFS, Hive.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Work on the core and Spark SQL modules of Spark extensively.
- Experienced in running Hadoop streaming jobs to process terabytes data.
- Involved in importing the real-time data to Hadoop using Kafka and implemented the Oozie job for daily imports.
Environment: Hadoop, Amazon Web Services (AWS), HDFS, Map Reduce, Hive, Sqoop, Apache Kafka, Zookeeper, Spark, Hbase, Python, Shell Scripting, Oozie, Maven, Hortonworks
Confidential - Minneapolis, MN
Hadoop Developer
Responsibilities:
- Built a suite of Linux scripts as a framework for easily streaming data feeds from various sources onto HDFS.
- Written multiple MapReduce programs in java 8 for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats.
- Wrote Interface specifications to ingest structured data into appropriate schemas and tables to support the rules and analytics.
- Involved in migration from Hadoop System to Spark System;
- Involved in start to end process of Hadoop cluster setup where in installation, configuration and monitoring the Hadoop Cluster.
- Responsible for Cluster maintenance, Cluster Monitoring, Troubleshooting, Manage & review Hadoop log files. Installation of various Hadoop Ecosystems.
- Responsible for Installation and configuration of Hive, Pig, Oozie, HBase and sqoop on the Hadoop cluster.
- Used Spark Streaming API with Kafka to build live dashboards; Worked on Transformations & actions in RDD and Spark Streaming, MapReduce, Pair RDD Operations, Partitioner, Check-pointing, and SBT.
- Refactored formal Hive queries to Spark SQL.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume.
- Extracted the data from Teradata into HDFS using Sqoop and exported the patterns analyzed back into Teradata using Sqoop.
- Processed HDFS data and created external tables using Hive and developed scripts to ingest and repair tables that can be reused across the project.
- Used Talend Open Studio to perform ETL aggregations in Hadoop HIVE & PIG.
- Wrote Pig Scripts to generate MapReduce jobs and performed ETL procedures on the data in HDFS.
- Developed Pig Scripts, Pig UDFs and Hive Scripts, Hive UDFs to load data files.
- Used hive schema to create relations in pig using HCatalog.
- Developed a Java MapReduce and pig cleansers for data cleansing.
- Implemented POC to migrate map reduce jobs into Spark RDD transformation using Scala IDE for Eclipse.
- Implemented Machine Learning Models like K-means clustering using PySpark.
- Used Spark to create reports for analysis of the data coming from various sources like transaction logs.
- Used Oozie Operational Services for batch processing and scheduling work flows dynamically.
- Used Maven extensively for building jar files of MapReduce programs and deployed to cluster.
- Managed Agile Software Practice using Rally by creating Product Backlog, Iterations and Sprints in collaboration with the Product Team.
Environment: Hadoop, HDFS, Hive, Pig, MapReduce, YARN, Datameer, Sqoop, Flume, Oozie, Linux, Teradata, HCatalog, Java 8, Eclipse IDE, GIT.
Confidential - Louisville,KY
Hadoop Developer
Responsibilities:
- Designed, developed and tested Map Reduce programs on Mobile Offers Redemptions and Sent it to the downstream applications like HAVI. Scheduled this MapReduce job through Oozie workflow.
- Ingested huge amount of XML files into Hadoop by Utilizing DOM Parsers with in Map Reduce. Extracted Daily Sales, Hourly Sales and Product Mix of the items sold in Yum Brand Restaurant's and loaded them into Global Data Warehouse.
- Wrote and tested the MapReduce code to do aggregations on identified and validated data.
- Scheduled Multiple Map Reduce jobs in Oozie. Involved in extracting the promotions data for all stores within USA by writing the map reduce jobs and automating it with UNIX shell script.
- Gathered business requirements in meetings for successful implementation and POC and moving it to Production.
- Implemented POC to migrate map reduce jobs into Spark RDD transformations using Scala IDE for Eclipse.
- Implemented different machine learning techniques in Scala using Scala machine learning library.
- Developed Spark applications using Scala for easy Hadoop transitions.
- Created RDD's in Spark Scala and Python.
- Closely worked with Admin team to gather hardware for Data nodes, edge nodes, and Name nodes.
- Successfully loaded files to Hive and HDFS from Oracle, Netezza and SQL Server using SQOOP.
- Used Talend Open Studio to load files into Hadoop HIVE tables and performed ETL aggregations in Hadoop HIVE.
- Designed & Created ETL Jobs through Talend to load huge volumes of data into Cassandra.
- Used Sqoop to import data from SQL server to Cassandra.
- Worked on analyzing, writing Hadoop MapReduce jobs using Java API, Pig and Hive.
- Developed some machine learning algorithms using Mahout for data mining for the data stored in HDFS.
- Used Flume extensively in gathering and moving log data files from Application Servers to a central location in Hadoop Distributed File System (HDFS).
- Worked with Oozie Workflow manager to schedule Hadoop jobs and high intensive jobs.
- Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
- Loaded data into HIVE tables, and extensively used Hive/HQL or Hive queries to query data in Hive Tables.
- Introduced Tableau Visualization to Hadoop to produce reports for Business and BI team.
- Creating UDF functions in Pig & Hive and applying partitioning and bucketing techniques in Hive for performance improvement.
- Creating indexes and tuning the SQL queries in Hive and Involved in database connection by using Sqoop.
- Involved in Hadoop Namenode metadata backups and load balancing as a part of Cluster Maintenance and Monitoring.
- Worked on Spark with Python and Scala.
- Serialized data in Hadoop using Avro serialization system.
- Used File System Check (FSCK) to check the health of files in HDFS.
- Monitored Nightly jobs to export data out of HDFS to be stored offsite as part of HDFS backup.
- Used Pig for analysis of large data sets and brought data back to Hbase by Pig.
- Scheduled, monitored and debugged various MapReduce, Pig, Hive jobs using Oozie Workflow.
- Design and deployment of Storm cluster integration with Kafka and HBase.
- Implemented authentication and authorization service using Kerberos authentication protocol.
Environment: Hadoop 1.2.1, MapReduce, Sqoop 1.4.4, Hive 0.10.0, Flume 1.4.0, Oozie 3.3.0, Pig 0.11.1, Hbase 0.94.11, Scala, Python, Zookeeper 3.4.3, Talend Open Studio, kafka, Storm, Oracle 11g/10g, Apache Cassandra, SQL Server 2008, MySQL 5.6.2, Java 7, SQL, PLSQL, Toad 9.7, Eclipse Kepler IDE, Microsoft Office 2007, MS Outlook 2007, SharePoint Teamsite
Confidential - San Francisco, CA
Hadoop Developer
Responsibilities:
- Analyzed, Designed and developed the system to meet the requirements of business users.
- Participated in the design review of the system to perform Object Analysis and provide best possible solutions for the application.
- Imported and exported terabytes of data using Sqoop from HDFS to Relational Database Systems.
- Developed MapReduce Jobs using Hive and Pig.
- Collected and aggregated large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in Java for data cleaning and preprocessing.
- Developed Map Reduce (YARN) jobs for accessing and validating the data.
- Involved in managing and reviewing Hadoop log files.
- Load and transform large sets of structured, semi structured and unstructured data.
- Responsible to manage data coming from different sources
- Involved in loading data from UNIX file system to HDFS.
- Installed and configured Hive and also written Hive QL scripts.
- Involved in creating Hive tables, loading with data and writing hive queries which run internally in map reduce way.
- Implemented Partitioning, Dynamic Partitions, Buckets in HIVE.
- Monitor System health and logs and respond accordingly to any warning or failure conditions.
- Used ClearCase for version control.
Environment: Hadoop, Map Reduce, HDFS, Hive, Java, Hadoop distribution of Hortonworks, Cloudera, Map, Flat files, Oracle 11g/10g, UNIX Shell Scripting, ClearCase, Junit
Confidential - Horsham, PA
Java Developer
Responsibilities:
- Involved in the analysis, Design, Coding, Modification and implementation of User Requirements in the Electronic Credit File Management system.
- Designed the application using Front Controller, Service Controller, MVC, Session Facade Design Patterns.
- The application is designed using MVC Architecture.
- Implemented the required functionality using Hibernate for persistence & Spring Frame work.
- Used Spring Framework for Dependency Injection.
- Designed and implemented the Hibernate Domain Model for the services.
- Developed UI using HTML, JavaScript and JSP and developed Business Logic and Interfacing components using Business Objects, XML, and JDBC.
- Designed user-interface and checking validations using JavaScript.
- Involved in design of JSP's and Servlets for navigation among the modules.
- Developed various EJBs for handling business logic and data manipulations from database.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed SQL queries and Stored Procedures using PL/SQL to retrieve and insert into multiple database schemas.
- Developed the XML Schema and Web services for the data maintenance and structures Wrote test cases in JUnit for unit testing of classes.
- Used DOM and DOM Functions using Firefox and IE Developer Tool bar for IE.
- Debugged the application using Firebug to traverse the documents.
- Involved in developing web pages using HTML and JSP.
- Provided Technical support for production environments resolving the issues, analysing the defects, providing and implementing the solution defects.
- Built and deployed Java applications into multiple UNIX based environments and produced both unit and functional test results along with release notes.
- Developed the presentation layer using CSS and HTML taken from bootstrap to develop for browsers.
Environment: Java 6, Spring, JSP, Hibernate, XML, HTML, JavaScript, JDBC, CSS, SOAP Web services.
Confidential
Jr. Java Developer
Responsibilities:
- Involved in Development of master screens like Service Requests, Change Requests Screens.
- Design architecture following J2EE MVC framework.
- Developed interfaces using HTML, JSP pages and Struts -Presentation View.
- Developed Struts Framework and configuring web.xml and struts-config.xml according to the struts framework.
- Developed and implemented Servlets running under JBoss.
- Used J2EE design patterns and Data Access Object (DAO) for the business tier and integration Tier layer of the project; Developed Java UI using swing.
- Used Java Message Service (JMS) for reliable and asynchronous exchange of important information between the clients and the customer
- Designed and developed Message driven beans that consumed the messages from the Java message queue.
- Development of database interaction code to JDBCAPI making extensive use of SQL Query Statements and advanced prepared statement.
- Taken care of complete Java multi-threading part in back end components.
- Inspection/Review of quality deliverables such as Design Documents.
- Wrote SQL Scripts, Stored procedures and SQL Loader to load reference data.
- Used Web Services for interacting with a remote client to access data; Performed Unit Testing and Regression testing; Fixed the bugs identified in test phase; Used Junit for testing Java classes.
- Used Ant building tool to build the application.
Environment: J2EE (Java Servlets, JSP, Struts), MVC Framework, Apache Tomcat, Oracle8i, JMS, SQL, HTML, JDBC, EJB, ANT, Junit.