- Having 8+years of professional IT experience in all phases of Software Development Life Cycle including hands on experience in Java/J2EE technologies and Big Data Analytics.
- Around 4 years of work experience in ingestion, storage, querying, processing and analysis of BigData with hands on experience in Hadoop Ecosystem development including MapReduce, HDFS, Hive, Pig, Spark, Cloudera Navigator, Mahout, HBase, ZooKeeper, Sqoop, Flume, Oozie and AWS.
- Good understanding of R Programming, Data Mining and Machine Learning techniques.
- Strong experience and knowledge of real time data analytics using Storm, Kafka, Flume and Spark.
- Experience in Developing Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
- Experience in troubleshooting errors in HBase Shell, Pig, Hive and MapReduce.
- Involved in upgrading existing MongoDB instances from version 2.4 to version 2.6 by upgrading the security roles and implementing newer features.
- Extensive experience working in Teradata, Oracle, Netezza, SQLServer and MySQL database.
- Excellent understanding and knowledge of NOSQL databases like MongoDB, HBase, and Cassandra.
- Strong experience working with different Hadoop distributions like Cloudera, Hortonworks, MapR and Apache distributions.
- Experience in installation, configuring, supporting and managing Hadoop Clusters using Apache, Cloudera (CDH 5.X) distributions and on Amazon web services (AWS).
- Experience in cloud platforms/Data Lake like AWS, Azure.
- Experience in Amazon AWS services such as EMR, EC2, S3, Cloud Formation, Redshift which provides fast and efficient processing of Big Data.
- Implemented Kafka Custom encoders for custom input format to load data into Kafka Partitions. Real time streaming the data using Spark with Kafka for faster processing.
- Responsible for performing reads and writes in Cassandra from and web application by using java JDBC connectivity.
- Experience in extending HIVE and PIG core functionality by using custom UDF’s and UDAF’s.
- Debugging MapReduce jobs using Counters and MRUNIT testing.
- Expertise in writing the Real - time processing application Using spout and bolt in Storm.
- Experience in configuring various topologies in storm to ingest and process data on the fly from multiple sources and aggregate into central repository Hadoop.
- Good understanding on Spark Streaming with Kafka for real-time processing.
- Used Solr to enable indexing for enabling searching on Non-primary key columns from Cassandra key spaces.
- Experienced in migrating ETL transformations using Pig Latin Scripts, transformations, join operations.
- Good understanding of MPP databases such as HP Vertica, Greenplum and Impala.
- Good knowledge on streaming data from different data sources like Log files, JMS, applications sources into HDFS using Flume sources.
- Extensive experience in middle-tier development using J2EE technologies like JDBC, JNDI, JSP, Servlets, JSF, Struts, Spring, Hibernate, EJB.
- Extensive experience in working with SOA based architectures using Rest based web services using JAX-RS and SOAP based web services using JAX-WS.
- Experience working on Version control tools like SVN and Git revision control systems such as GitHub and JIRA/MINGLE to track issues and crucible for code reviews.
- Worked on various Tools and IDEs like Eclipse, IBM Rational, Visio, Apache Ant-Build Tool, MS-Office, PLSQL Developer, SQL*Plus.
- Experience in importing and exporting data using Sqoop from HDFS to Relational Database Systems and vice-versa.
- Worked on Docker based containerized applications.
- Knowledge of data warehousing and ETL tools like Informatica, Talend and Pentaho.
- Experienced in working with monitoring tools to check status of cluster using Cloudera manager, Ambari and Ganglia.
- Experience with Testing MapReduce programs using MRUnit, Junit.
- Experience in different application servers like JBoss/Tomcat, Web Logic, IBM WebSphere.
Big Data Ecosystem: Hadoop, MapReduce, Pig, Hive, YARN, Kafka, Flume, Sqoop, Impala, Oozie, ZooKeeper, Spark, Solr, Storm, Drill, Ambari, Mahout, MongoDB, Cassandra, Avro, Parquet and Snappy.
Hadoop Distributions: Cloudera, MapR, Hortonworks, IBM Big Insights
No SQL Databases: Cassandra, MongoDB and HBase
Java Technologies: Servlets, JavaBeans, JSP, JDBC, JNDI, EJB and struts
XML Technologies: XML, XSD, DTD, JAXP (SAX, DOM), JAXB
Development / Build Tools: Eclipse, Ant, Maven, Gradle, IntelliJ, JUNIT and log4J.
Frameworks: Struts, spring and Hibernate
App/Web servers: WebSphere, Web Logic, JBoss and Tomcat
DB Languages: MySQL, PL/SQL, PostgreSQL and Oracle
RDBMS: Teradata, Oracle 9i,10g,11i, MS SQL Server, MySQL and DB2
Operating systems: UNIX, LINUX, Mac OS and Windows Variants
Data analytical tools: R, SAS and MATLAB
ETL Tools: Tableau, Talend, Informatica, Pentaho
Confidential, Emeryville, CA.
Hadoop/ Bigdata Developer
- Spark Jobs were written to perform Transformations on data before files were moved to Data storage S3.
- Optimized the indexing and deleting the customer subscriptions records based on the customer id.
- Deploying the spark jobs in EMR cluster in AWS.
- Implemented the Airflow DAGs in Python to schedule series of job steps in pipeline, including cluster creation step, spark job execution step, S3 data operation & slack notifications.
- Worked on large-scale Hadoop YARN cluster for distributed data processing and analysis using Connectors, Spark core, Spark SQL, Sqoop, Pig, Hive and NoSQL databases.
- Worked on Pytest framework for implementing unit test cases for Python.
- Implemented individual Slack notifications using Slack API, from Airflow DAGs based on environment & project.
- Developed Spark scripts by using Scala shell commands as per the requirement.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Worked on MySQL DB and SQL queries using SQLWorkbenchJ.
- Debugging and analyzing spark jobs and Airflow DAG executions.
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, and Data Frames.
- Worked on developing Python scripts to automate the data transfer to/from s3.
- Performed checks for the data quality validation with Spark’s file processing jobs.
- Worked on debugging the Spark Streaming application in case of abnormal data resulting to interrupt the TV programs.
- Involved in Spark Listener Programming to automate the analytics to analyze the customer data.
- Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
- Creating queues on YARN queue manager to share the resources of the Cluster for the MapReduce jobs given by the users.
- Worked extensively with Hive and HBase for data validation and analysis.
- Experienced in importing the Data from S3 to Spark RDD for various transformations and actions.
- Involved in performance tuning of Spark application for data analytics and processing.
- Involved in usage of Amazon Sage Maker for Spark’s Machine Learning.
- Worked on data modeling and design of Hive and HBase Table structures based on the project reporting and analytic needs.
- Configured Spark SQL to use the AWS Glue Data Catalog as its metastore.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, Airflow, Python, Scala, AWS EC2, Spark, Idea IntelliJ, Unix/Linux, S3, SQL, Git, Jira, Slack.
Confidential Mountain view, CA.
Sr. Hadoop Developer
- Developed efficient MapReduce programs for filtering out the unstructured data and developed multiple MapReduce jobs to perform data cleaning and preprocessing on Hortonworks.
- Implemented Data Interface to get information of customers using Rest API and Pre-Process data using MapReduce 2.0 and store into HDFS (Hortonworks)
- Extracted files from MySQL, Oracle, and Teradata 2 through Sqoop 1.4.6and placed in HDFS Cloudera Distribution and processed.
- Worked with various HDFS file formats like Avro1.7.6, Sequence File, Json and various compression formats like Snappy, bzip2.
- Proficient in designing Row keys and Schema Design for NoSQL Database HBase and knowledge of other NOSQL database Cassandra.
- Used Hive to perform data validation on the data ingested using scoop and flume and the cleansed data set is pushed in to HBase.
- Good understanding of Cassandra Data Modeling based on applications.
- Wrote ETL jobs to read from web APIs using REST and HTTP calls and loaded into HDFS using java and Talend.
- Developed the Pig 0.15.0UDF's to pre-process the data for analysis and Migrated ETL operations into Hadoop system using Pig Latin scripts and Python Scripts3.5.1.
- Used Pig as ETL tool to do transformations, event joins, filtering and some pre-aggregations before storing the data into HDFS.
- Troubleshooting, debugging & altering Talend issues, while maintaining the health and performance of the ETL environment.
- Loaded data into the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Used spark to parse XML files and extract values from tags and load it into multiple hive tables.
- Experienced in running Hadoop streaming jobs to process terabytes of formatted data using Python scripts.
- Developed small distributed applications in our projects using Zookeeper3.4.7and scheduled the workflows using Oozie 4.2.0.
- Proficiency in writing the Unix/Linux shell commands.
- Developed a SCP Stimulator which emulates the behavior of intelligent networking and Interacts with SSF.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Kafka, Flume, Oracle 11g, Core Java, Spark, Scala, Cloudera HDFS, Talend, Eclipse, Oozie, Node.js, Unix/Linux, Aws, JQuery, Ajax, Python, Perl, Zookeeper.
Confidential Charlotte, NC.
- Multiple Spark Jobs were written to perform Data Quality checks on data before files were moved to Data Processing Layer.
- Implemented Spark using Scala and Spark SQL for faster testing and processing of data. Responsible for ingesting the data on to Data Lake.
- Designed and Modified Database tables and used HBASE Queries to insert and fetch data from tables.
- Involved in moving all log files generated from various sources to HDFS for further processing through Flume1.7.0.
- Involved in deploying the applications in AWS and maintains the EC2 (Elastic Computing Cloud) and RDS (Relational Database Services) in amazon web services.
- Implemented the file validation framework, UDFs, UDTFs and DAOs.
- Strong experienced in working with UNIX/LINUX environments, writing UNIX shell scripts, Python and Perl.
- Build REST web service by building Node.js Server in the back-end to handle requests sent from the front-end JQuery Ajax calls.
- Importing and exporting data from different databases like MySQL, RDBMS into HDFS and HBASE using Sqoop.
- Involved in creating Hive tables, loading with data and writing hive queries.
- Model and Create the consolidated Cassandra, Filo DB and Spark tables based on the data profiling.
- Used OOZIE1.2.1Operational Services for batch processing and scheduling workflows dynamically and created UDF's to store specialized data structures in HBase and Cassandra.
- Developed multiple MapReduce jobs in Java for data cleaning and pre-processing.
- Used Impala to read, write and query the Hadoop data in HDFS from Cassandra and configured Kafka to read and write messages from external programs.
- Optimizing existing algorithms in Hadoop using Spark Context, Spark-SQL, Data Frames and Pair RDD's.
- Create a complete processing engine, based on Cloudera distribution, enhanced to performance.
Environment: Hadoop, HDFS, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Kafka, Flume, Oracle 11g, Core Java, Filo DB, AWS EC2, Spark, Scala, Cloudera HDFS, Eclipse, Web Services (SOAP, WSDL), Oozie, Node.js, Unix/Linux, JQuery, Ajax, Python, Perl, Zookeeper.
- Involved in Analysis, design and coding on JAVA/JSP Front End Environment.
- Developed application using Spring, Servlets, JSP and EJB.
- Implemented MVC (Model View Controller) architecture.
- Designed the Application flow using Rational Rose.
- Used web servers like Apache Tomcat.
- Developed the user interfaces with the spring tag libraries.
- Developed, build and deployment scripts using Apache ANT to customize WAR, EAR and EJB jar files.
- Prepared field validation and on-scenario test cases using Junit and testing of the module in 3 phases named unit testing and system using testing and regression testing.
- Code and unit test according to client standards.
- Wrote DB queries using SQL for interacting with database.
- Design and develop XML processing components for dynamic menus on the application.
- Created Components using JAVA, Spring and JNDI.
- Prepared Spring deployment descriptors using XML.
- Developed the entire application using Eclipse and deployed them on Web Sphere Application Server.
- Developed a logging component using Apache Log to log messages and errors and wrote test cases to verify the code for different conditions using Junit.
Environment: Java, HTML, Spring, JSP, Servlets, DBMS, Web Services, JNDI, JDBC, Eclipse, Web sphere, XML/XSL, Apache Tomcat, TOAD, Oracle, MySQL, JUNIT, Log4j, SQL, PL/SQL, CSS.
- Involved in creation of a queue manager in WebSphere MQ along with the necessary WebSphere MQ objects required for use with WebSphere Data Interchange.
- Experience with SOAP Web services and WSDL .
- Use ANT scripts to automate application build and deployment processes.
- Involved in design, development and Modification of PL/SQL stored procedures, functions, packages and triggers to implement business rules into the application.
- Used RESTful web services with MVC for parsing and processing XML data.
- Developed ETL processes to load data from Flat files, SQL Server and Access into the target Oracle database by applying business logic on transformation mapping for inserting and updating records when loaded.
- Deployed web applications on Tomcat and JBoss server.
- Involved in creating User Authentication page using Java Servlets .
- Migrated data source passwords to encrypted passwords using Vault tool in all the JBoss application servers.
- Used Spring Framework for Dependency injection and integrated using Hibernate.
- Used JMS for asynchronous communication between different modules.
- Actively involved in code reviews and in bug fixing.
- Followed agile software methodology for project development.