- Over 7+ years of IT experience in Analysis, Design, Development and in Big Data,Scala, SparkHadoop and HDFS environment and experience in JAVA, J2EE.
- Experienced in developing and Implementing MapReduce programs using Hadoop to work with Big Data as per the requirement.
- Excellent experience on Scala, Apache Spark, Spark Streaming, Pattern - matching, Map-reducing, Frame-works like lift framework and Play framework, RDD (Resilient Distributed Datasets).
- Extensive testing ETL experience using Informatica 8.1 /7.1/6.2 (Power Center/ Power Mart) (Designer, Workflow Manager, Workflow Monitor and Server Manager).
- Developed Talend ETL test scripts based on technical specifications/Data design documents and Source to Target mappings.
- Developing core java code and strong in Design, Software processes, Requirement gathering, Analysis and development of software applications in the roles of Programmer Analyst, Big Data Developer.
- Good knowledge in advanced java topics such as Generics, Collections and multi-threading.
- Increased the accuracy, stability, and automation of the reverse DNS creation process
- Ensure that users have continuous access to e-mail, intranet, the public Internet and other crucial business applications
- Knowledge on Networking Technologies like TCP/IP, DNS and webservers.
- Excellent experience in Apache Hadoop ecosystem components like Hadoop Distributing File System (HDFS), MapReduce, Hive, Sqoop, Maven, HBASE, PIG, Kafka, Zoo Keeper, Scala, Flume, Storm and Oozie.
- Good Knowledge of Hadoop architecture and various components such as HDFS Framework, Job Tracker, Task Tracker, Name Node, Data Node and MRV1 and MRV2 (YARN)
- Experienced in developing MapReduce jobs in Java for data cleansing, transformations, pre-processing and analysis. Multiple mappers are implemented to handle data from multiple sources.
- Experienced on Spark and Scala, Spark SQL, Spark Streaming, Spark GraphX, SparkMlib.
- Experienced in installing, configuring, and administrating Hadoop cluster of major Hadoop distributions Hortonworks, Cloudera.
- Experienced on Hadoop daemon functionalities, resource utilizations and dynamic tuning in order to make cluster available and efficient.
- Expertise in writing custom UDF's for extending Hive and Pig core functionality.
- Experienced in setting up data gathering tools such as Flume and Sqoop.
- Experienced in working with Flume to load the log data from multiple sources directly into HDFS.
- Excellent Knowledge on NOSQL Databases like Cassandra, MongoDB and HBASE.
- Experienced in Hive Partitioning, bucketing and perform different types of joins on Hive tables and implementing Hiveserdes like REGEX, JSON and Avro.
- Experienced in Scripting using UNIX shell script. Experienced in analyzing, designing and developing ETL strategies and processes, writing ETL specifications, Informatica development.
- Extensively worked on the TalendETL mappings, analysis and documentation of OLAP reports requirements. A good understanding of OLAP concepts working especially with large data sets.
- Experienced in Dimensional Data Modeling using star and snowflake schema.
- Good knowledge on Data Mining and Machine Learning techniques. Proficient in Oracle … SQL and PL/SQL.
- Experienced in integration of various data sources like Oracle, DB2, and Sybase, SQL server and MS access and non-relational sources like flat files into staging area.
- Experienced in large cross-platform applications using JAVA, J2EE with experience in Java core concepts like OOPS, Multi-threading, Collections and IO.
- Experienced on applications using Java, RDBMS, and Linux shell scripting.
- Have good interpersonal skills, good communication, problem solving skills and a motivated team player.
- Have the ability to be a value contribution to the company.
Hadoop Eco System: Hadoop, Map Reduce, Sqoop, Hive, Oozie, Pig, HDFS, ZooKeeper, FlumeHBASE, Impala, Spark, Storm, Hadoop (Cloudera), Horton Works and Pivotal).
No SQLDatabases: HBASE, Cassandra, MongoDB
Java & J2EE Technologies: Core Java, Servlets, JSP, JDBC, NetBeans, Eclipse
Languages: Java, SAS, Scala and Apache Spark, SQL, PL/SQL, PIG Latin, HiveQL, UNIX
Databases: Oracle … My SQL, DB2, MS SQL Server
Application Server: Apache Tomcat, JBoss, IBM Web sphere, Web Logic
Web Services: WSDL, SOAP, REST
Methodologies: Agile, Scrum
Confidential - San Diego, Ca
Sr. Hadoop Developer
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications ETL to Hadoop.
- Wrote the Spark code in Scala to connect to HBASE and read/write data to the HBASE table.
- Extracted data from different databases and to copy into HDFS using Sqoop and have an expertise in using compression techniques to optimize the data storage.
- Involved in designing and deploying multiple applications utilizing almost all of the Amazon Web Services (AWS) stack (Including EC2,EBS,Internal ELB, Route53, S3, IAM) focusing on high-availability, fault tolerance, and auto-scaling.
- Experience in migration of projects to AWS cloud. Designing the architecture for moving the code to cloud.
- Experience on Java 8 development structure for Eclipse. Used various core java concepts like Collections and Multithreading for complex data computations and analysis.
- Developed business and transaction services using Servlets and some core java concepts like Multithreading, Concurrent Hash Map and I/O Streams.
- Excellent knowledge on Hadoop ecosystems such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node and MapReduce programming paradigm.
- Designing and implementing DR Strategy for several environments in cloud. Involved in designing and deploying multiple applications utilizing almost all of the Amazon Web Services (AWS).
- Functional, non-functional and performance testing of key systems prior to cutover to AWS.
- Running of Apache Hadoop, CDH and Elastic, Map-Reduce(EMR) on (EC2).
- Migrated an existing on-premises application to AWS. Used AWS services like EC2 and S3 for small data sets processing and storage, Experienced in Maintaining the Hadoop cluster on AWS EMR.
- Implemented ETL code to load data from multiple sources into HDFS using pig scripts.
- Involved in file movements between HDFS and AWS S3 and extensively worked with S3bucket in AWS.Big Data tool to load the big volume of source files from S3 to Redshift.
- Used different Serdes for converting JSON data into pipe separated data.
- Optimized Map/Reduce Jobs to use HDFS efficiently by using various compression mechanisms.
- Creating the Spark Streaming code to take the source files as input. Used Oozie workflow to automate all the jobs.
- Developed spark programs using Scala, Involved in creating Spark SQL Queries and Developed Oozieworkflow for spark jobs.
- Designed and Developed Scala workflows for data pull from cloud based systems and applying transformations on it.Coordinate for development of Jenkins jobs.
- Built analytics for structured and unstructured data and managing large data ingestion by using Avro, Flume, Kafka and Sqoop.
- Worked on scalable distributed computing systems, software architecture, data structures and algorithms using Hadoop, Apache Sparkand Apache Storm etc.
- Ingested streaming data into Hadoop using Spark, Storm Framework and Scala.
- Developed Bankers Rounding UDF for Hive/Pig or Implemented Teradata Rounding in Hive/Pig.
- Continuously monitored and managed the Hadoop Cluster using ClouderaManager.
Environment: Hadoop, Map Reducer, Aws, S3, Redshift, HDFS, Hive, Pig, Spark, Storm, Flume, Kafka, Sqoop, Java 8, Oozie, Impala, SQL, Scala, Java (JDK 1.6), Hadoop (Cloudera) and Eclipse.
Confidential - Des Moines, Iowa
Sr. Big Data Developer
- Responsible for building scalable distributed data solutions using Hadoop and migrate legacy Retail applications Data to Hadoop.
- Installed and configured Hive, Pig and Sqoop on the HDP 2.0 cluster.
- Performed real time analytics on HBase using Java API and Fetched data to/from HBase by writing Map Reduce job. Installed and configured Hadoop Map Reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and Processing using HDP 2.0
- WroteSQL queries to process the data using SparkSQL.Worked extensively with importing metadata into Hive and migrated existing tables and applications to work on Hive and making the data available.
- Extracted data from different databases and to copy into HDFS file system using Sqoop.
- Worked extensively with importing metadata into Hive using Python and migrated existing tables and applications to work.
- Worked on project to retrieve log messages procured by leveraging Spark Streaming.
- Designed Ooziejobs for the auto processing of similar data. Collect the data using Spark Streaming .
- Analyzed the data by performing Hive queries and running Pig scripts to know user behavior.
- Worked with Spark Streaming to ingest data into spark engine. Extensively used for all and bulk collect to fetch large volumes of data from table.
- Performed transformations, cleaning and filtering on imported data using Hive , MapReduce , and loaded final data into HDFS .
- Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
- Responsible for analyzing and cleansing raw data by performing Hive queries and running Pig scripts on data.
- Coordinate and plan with Application teams on MongoDB capacity planning for new applications.
- Created aggregation queries for reporting and analysis. Collaborated with development teams to define and apply best practices for using MongoDB.
- Built data lake ecosystem using Hadoop technologies, such as Hive, HBASE, Map-reduce, Pig, HDFS, Scala and Spark to ingest and process Kinesis data Streams.
- Developed spark application for filtering JSON source data in location and store it into HDFS with partitions and used Spark to extract schema of JSONfiles.
- Imported the data from different sources like Talend ETL, Local file system into Spark RDD .
- Responsible for importing data from MySQL to HDFS and provide the query capabilities using HIVE.
- Used Sqoop to import the data from RDBMS to HadoopDistributed File System (HDFS) and later analyzed the imported data using Hadoop Components.
- Developed the Sqoop scripts to make the interaction between Pig and MySQL Database.
- Involved in writing shellscripts in scheduling and automation of tasks.
- Managed and reviewed Hadoop log files to identify issues when Job fails.
Environment: Hadoop, Map Reducer, HDFS, Jenkins, MongoDB,Hive, Pig, Spark, Storm, Kafka, Flume, Sqoop, Oozie, SQL, Scala, Java (JDK 1.6), Hadoop (Horton Works-HDP 2.0) and Eclipse.
Confidential - Houston, TX
- Gathered business requirements from the Business Partners and subject matter experts and prepared Business Requirement document.
- Developed simple to complex Map/Reduce jobs using Hive and Pig.
- Handled importing of data from various data sources performed transformations using Hive, MapReduce, and loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop.
- Good exposure in setting up mongo environments for different use cases. Creating and deploying mongo instances and clusters from a central repository.
- Experienced in fixing mongo slave replication lag issues. Experienced in Mongo Profiling and logging.
- Wrote entities in Scala and Java along with named queries to interact with database.
- Analyzed the data by performing Hive queries and Pig scripts to study customer behavior.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the HadoopDistributed File System and PIG to pre-process the data.
- Installed, Configured Cognos8.4/10 on single and multi-server environments.
- Involved inSpark Streaming which collects this data from Kafka in near-real-time and performs necessary transformations and aggregation on the fly to build the common learner data model and persists the data in NoSQLstore (HBASE).
- Worked on different file formats like Sequence files, XML files and Map files using Map Reduce Programs.
- Used UDF's to implement business logic in Hadoop.
- Created HBASE tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
- UsedNoSQL database Cassandra for information retrieval.
- Implemented of Regression analysis using MapReduce.
- Developed scripts and Batch Jobs to schedule various Hadoop programs using Oozie.
- Imported/exported data from RDBMS to HDFS using Data Ingestion tools like Sqoop.
- TestedMapReduce code using JUnit testing.
- UsedCloudera manager to monitor the health of the jobs which are running on the cluster.
Environment: Java (JDK 1.6), Hadoop, MapReduce, MongoDB, Pig, Hive, Scala, Spark, Cassandra, Sqoop, Oozie, HDFS, Hadoop (Cloudera), MySQL, Eclipse, Oracle.
- Prepared High Level and Low Level Design document implementing applicable Design Patterns with UML diagrams to depict components, class level details.
- Interacting with the system analysts & business users for design & requirement clarification.
- Developed Web Services using SOAP, SOA, WSDL Spring MVC and developed DTDs, XSD schemas for XML (parsing, processing, and design) to communicate with Active Directory application using Restful API.
- Developed JSPs according to requirement.
- Excellent knowledge of NOSQL on Mongo and CassandraDB.
- Developed integration services using Web Services, SOAP, and WSDL.
- Designed, developed and maintained the data layer using the ORM framework in Hibernate.
- Involved in Analysis, Design, Development, and Production of the Application and develop UML diagrams.
- Presented top level design documentation to the transition of various groups.
- Used spring framework's JMS support for writing to JMS Queue, Hibernate Dao Support for interfacing with the database and integrated spring with JSF.
- Wrote AngularJScontrollers, views, and services.
- UsedAnt for building and the application is deployed on JBOSS application server.
- Developed HTML reports for various modules as per the requirement.
- Analyzed known information into concrete concepts and technical solutions.
- Assisted in writing the SQL scripts to create and maintain the database, roles, users, tables in SQL Server.
Jr. JAVA Developer
- Analyzed Object Oriented Design and presented with UML Sequence, Class Diagrams.
- Managed connectivity using JDBC for querying/inserting & data management including triggers and stored procedures.
- Developed components using Java multithreading concept.
- Developed various EJBs (session and entity beans) for handling business logic and data manipulations from database.
- Involved in design of JSP's and Servlets for navigation among the modules.
- Designed cascading style sheets and XSLT and XML part of Order entry Module & Product Search Module and did client side validations with java script.
- Hosted the application on Web Sphere.