- 10 plus years of impeccable work experience in IT Industry, with over 3 Years of professional work experience in Big Data Hadoop.
- Extensive knowledge in Devops.
- Extensive experience in MapReduce MRv1.
- Extensive experience in testing, debugging and deploying MapReduce Hadoop platforms
- Extensive experience in working with HDFS, PIG, Hive, Sqoop, Flume, Oozie, Zookeeper, Python, Scala, Spark and Cassandra
- Experience with Cloudera CDH4, CDH5 distributions, and knowledge on CDH 5.8.1 Community and Express, Teradata and AWS for deploying and running Hadoop and spark applications.
- Extensive experience with ETL and Big data query tools like Pig Latin and Hive QL
- Expertise in installing, designing, sizing, configuring, provisioning and upgrading Hadoop environments
- Experience in tuning and troubleshooting performance issues in Hadoop cluster with size of data over 40 TB
- Experience on monitoring, performance tuning, SLA, scaling and security in Big Data systems.
- Experience using different file formats like Parquet, Apache Avro, JSON, XML and Flat file.
- Hands on NoSQL database experience with HBase, and Cassandra
- Worked on Web Services including RESTful, SOAP, WSDL, UDDI, JAX - RS and JAXB.
- Experienced in implementing Spark Machine Learning algorithms to implement business analysis
- Extensive experience in documenting requirements, functional specifications, technical specifications
- Highly motivated, adaptive and quick learner and exhibitedexcellent communication, excellent analytical, Problem solving and technical skills
- Holds strong ability to handle multiple priorities and work load and also has the ability to understand and adapt to new technologies and environments faster
- Experience in Continuous Integration and related tools (i.e. Jenkins, Hudson, Maven)
- Experience with Code Quality Governance related tools (Sonar, Gerrit, PMD, FindBugs, Checkstyle, Emma, Cobertura, JIRA, etc)
- Experience in Source Code Management Tools (GitHUB, SVN, CVS, Clearcase)
- Experience in Hamake or Cascading and Hadoop stream processing using Storm
Big Data/ Hadoop Framework: HDFS, MapReduce, Pig, Hive, Sqoop, Kafka, Oozie, Zookeeper, Flume and HBase, SPARK.
Databases: Microsoft SQL Server, MySQL, Oracle, Cassandra
Languages: C, C++, Java, Scala, Python, SQL, PLSQL, Google’s Go, Pig Latin, HiveQL, COBOL
Web Technologies: JSP, JavaBeans, JDBC, XML, SOAP and RESTful Web Services, Spring, Hibernate, Angular JS, JSON,Servlets, JNDI, JMS, JMX, RMI, Java Web Services
Operating Systems: Windows, Unix and Linux
Development Tools: Microsoft SQL Studio, Eclipse, NetBeans, MySQL Workbench.
Office Tools: Microsoft Office Suite
Development Methodologies: Agile/Scrum, Waterfall
Confidential, Rijuven, Pennsylvania
Big Data Developer
- Worked on a live Big Data Hadoop production environment with 400 nodes with AWS.
- Worked with highly unstructured and semi-structured data of 40 TB in size
- Experienced with batch processing of data sources using Apache Spark and Elastic search.
- Migrated Hive QL queries on structured into Spark QL to improve performance Designed and developed Pig ETL scripts to process data in a Nightly batch
- Created Pig Macros to improve reusability of code and modularizing the code
- Developed Hive scripts for end user / analyst requirements for ad-hoc analysis
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs
- Worked in tuning Hive and Pig scripts to improve performance
- Good experience in troubleshooting performance issues and tuning Hadoop cluster
- Good working knowledge of using Sqoop in performing incremental imports from Oracle to HDFS.
- Good experience in working with compressed files and related formats.
- Developed Oozie workflow for scheduling and orchestrating the ETL process
- Hands on experience with Cassandra and its architecture
- Performed data analysis with HBase using Hive external tables to HBase
- Very good understanding of Single Point Of Failure (SPOF) of Hadoop Daemons and recovery procedures
- Developed code base to stream data from sample Data files Kafka Spout Storm Bolt HDFS BOLT
- Documented the data flow from Application Kafka Storm HDFS Hive tables
- Configured, deployed and maintained a single node storm cluster in DEV environment
- Developing live streaming and predictive analytics using Apache Spark Scala APIs
- Developed solutions to pre-process large sets of structured, semi-structured data, with different file formats (Text file, Avro data files, Sequence files, XML and JSon files, ORC and Parquet files).
- Used Google’s Go language for Application Development - created REST APIs for CRUD operations.
- Good understanding of Impala
Environment: Scala, Python, Core Java MapReduce, HDFS, Pig, Hive, HBase, Oracle 10g, MySQL, Ubuntu,Spark 4.1, Kafka, Storm 0.9.5, Cassandra 2.2.0, AWS.
- Installed and configured Hadoop MapReduce, HDFS, Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Written MapReduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Worked on moving all log files generated from various sources to HDFS for further processing
- Developed workflows using custom MapReduce, Pig, Hive and Sqoop.
- Creating various views for HBASE tables and also utilizing the performance of Hive on top of HBASE.
- Developed the Apache Storm and HDFS integration project to do a real-time data analysis.
- Designed and developed the Apache Storm topologies for Inbound and outbound data for real-time ETL to find the latest trends and keywords.
- Developed Map Reduce program for parsing and loading into HDFS information.
- Built reusable Hive UDF libraries for business requirements which enabled users to use these UDF's in Hive Querying.
- Written Hive UDF to sort Structure fields and return complex data type.
- Responsible for loading data from UNIX file system to HDFS.
- Developed suit of Unit Test Cases for Mapper, Reducer and Driver classes using MR Testing library.
- Designed and developed a distributed processing system running to process binary files in parallel and crunch the analysis metrics into a Data Warehousing platform for reporting.
- Developed workflow in Control-M to automate tasks of loading data into HDFS and preprocessing with PIG.
- Cluster coordination services through Zookeeper.
- Used Maven extensively for building jar files of MapReduce programs and deployed to Cluster.
- Modeled Hive partitions extensively for data separation and faster data processing and followed Pig and Hive best practices for tuning.
Environment: Scala, Python, Hive QL, MySQL, HBase, HDFS, Eclipse, Hadoop, Flume, PIG, Sqoop, UNIX.
- Loading the data from the different Data sources like (Teradata and DB2) into HDFS using Sqoop and load into Hive tables, which are partitioned.
- Developed Hive UDF’s to bring all the customers email address into a structured format.
- Developed bash scripts to bring the log files from FTP server and then processing it to load into hive tables.
- Using Sqoop to load data from DB2 into HBASE environment.
- Inserted Overwriting the HIVE data with HBasedata daily to get fresh data every day.
- All the bash scripts are scheduled using Resource Manager Scheduler.
- Developed Oozie Workflows for daily incremental loads, which gets data from Teradata and then imported into the hive tables.
- Developed Pig scripts to transform the data into a structured format and it is automated through oozie coordinators.
- Worked on loading the data from MySQL to HBase where necessary using Sqoop.
- Developed Hive queries for analysis across different banners.
Environment: Windows 7, Hortonworks, Hadoop, HDFS, MapReduce, Sqoop, Hive, Pig, Hbase, Teradata, DB2, Oozie, MySQL, Eclipse
- Developed Application Servers persistence layer using JDBC and SQL.
- Used JDBC to connect the web applications to Databases.
- Implemented Test First unit testing framework driven using Junit.
- Developed and utilized J2EE Services and JMS components for messaging in Web Logic.
- Configured development environment using Web logic application server for developers integration testing.
Environment: Windows XP, Java/J2EE, SQL, Oracle 10g, JSP 2.0, AJAX, Java Script, Web Logic 8.0, HTML, JDBC, Spring, Hibernate, JSON, Angular JS.
- Converted several of Chrysler’s financial applications, written in HPS, to web-based Java applications.
- Knowledge of both source and target environments and technologies helped me play a key role in the development, testing, and deployment phases of the converted applications and participated in preparing the design and functional specifications documents.
- The presentation layer in the target architecture was developed using JSPs and servlets. These web components then communicated with the business logic stored on WebSphere application server as EJBs and JavaBeans. The middle-tier also communicates with other CICS Cobol business modules running on a mainframe environment via IBM s ECI Extended Call Interface middleware.
Environment: JDK 1.1.8, JDK 1.3, JSP, Servlet, JavaBeans, EJBs, JDBC, Swing, J2EE Server, IBM WebSphere 3.02, Visual Age for Java, DB2, HPS 5.3, and HTML.
Jr Java Developer
- Involved in preparing Technical Design Document of the project.
- Designed and developed application using JSP Custom Tags, Struts tags & JSTL tag libraries.
- Developed Controller Servlets, Action and ActionFrom objects for process of interacting with Sybase database using Struts.
- Implemented SOA architecture for different application to exchange data for business process.
- Used and configured Struts DynaActionForms, MessageResources, ActionMessages, ActionErrors, Validation.xml, Validator-rules.xml.
- Followed Agile Methodology (TDD, SCRUM) to produce high Quality software and satisfy the customers.
- Wrote build & deployment scripts using shell, Perl and ANT scripts.
- Wrote Stored procedures and Database Triggers using PL/SQL.
- Involved in using IBM WebSphere MQ Series connection with AS/400(IBM SYSTEM i).
- Worked in using JBoss Application Server for deploying and testing the code.
- Developed Report functionalities in excel using Jakarta Poi.
- Involved in Prototype using Macromedia Dream weaver.
- Designed network diagram, and set up the Development Environment, SIT and UAT environment by installing and configuring Web Logic Application Server on UNIX environment.
- Responsible in Integrating Application with CICS for Real Time Search Criteria and Retrieval.
- Used SAX and DOM for parsing XML documents and XSLT for transformation.
- Developed EJB's (Session Beans) for implementing business logic and transactional services.
- Developed MessageHandler Adapter, which converts the dataobjects into XML message and invoke an enterprise service and vice-versa usingJAVA, JMS, MQ Series.
- Responsible for preparing use cases, class and sequence diagrams for the modules using UML.
- Developed Data Access Layer to interact with backend by implementing Hibernate Framework.
- Wrote Junit classes for the services and prepared documentation.
- Developed Data Access Objects to access middleware web services as well as Sybase database.
- Integrated various modules and deployed them in WebSphere Application Server.