- Passionate Developer with over 9 years of experience in developing, configuring,implementing Hadoop and Big - data ecosystems on various platforms as well as development and documentation of various applications using Java.
- Proficient in analyzing and developing Hadoop and Big-Data environments based on the customers technical requirements.
- Capacious working experience on Hadoopeco-system with Kerberos integration, components like MapReduce (MRv1), (MRv2), Hive, Pig, Sqoop, Oozie, Kafka, ZooKeeper.
- Expertise in writing MapReduce programs and using Cloudera API, Apache Hadoop API, and Hortonworks distributions to interpret structure and unstructured data.
- Hands on expertise in ETL tools for data integration on big data and also importing and exporting the data from Relational Data Base Systems (RDBMS) to HDFS using SQOOP.
- Experienced in developing udf’s for Pig and Hive using Java to extend the core functionality.
- Have good experience on data cleansing and analysis using Hive, Pig as well as HadoopJavaAPI.
- Good knowledge on setting up job streaming and scheduling with Oozie, and working on messagingsystem such asKafka integrated with Zookeeper.
- Knowledge in installing Hadoop clusters, disaster recovery plan, configuring, performance tuning.
- Experience Palantir-technologies such as Slate&Contour using Mesa as the programming language.
- Extensive experiences in RESTful webservices, JMS, Spring and Hibernate.
- Strong experience in working with BI tools like Tableau, QlikView, Pentahoand working accordingly depending on the requirements and implementing in the Big Data eco systems.
- Comprehensive knowledge on software development using Shellscripting, coreJava and webtechnologies.
- Highly skilled in working with Spark streaming and Scala and also have sound knowledge in Spark SQL and Spark GraphX.
- Sound knowledge on real time data streaming solutions using Apache Spark Streaming, Kafka and Flume.
- Good knowledge in working with cloud integration with Amazon's Simple Storage Service (S3),Amazon Elastic MapReduce (EMR), Amazon Cloud Compute (EC2) and Microsoft Azure HDInsight.
- Experienced in developing simple to complex Map/Reduce jobs using Hadoop technologies to handle files in multiple formats (JSON, Parquet, XML, Avro, 834 files etc.)
- Hands on experience on verifying the cleansed data on Talend and also worked on Talend Administrative Console for adding users and scheduling jobs.
- Ability to work independently or in a group with dedication culminated with strong communication skills helps to meet the dead lines.
Big Data Eco systems: HDFS, MapReduce, YARN, ZooKeeper, Hive, Pig, Oozie, Sqoop, Kafka, Storm, Apache Spark, Apache Tez, Splunk, Impala, Nifi, Flume,Apache Solr,Kylin, Apache Zeppelin
Programming Languages: Java, Python, Scala, R
DataBases: MYSQL, SQL Server, DB2, Teradata, Netezza, Postgres, Green Plum
Hadoop Architectures: MRV1 and Yarn
NoSQL: Cassandra, HBase, MongoDB
Buil Management Tools: ANT, Maven, Stack, Scala Build tool
Continuous Integration Tools: Jenkins, Hudson, Snap CI, Bamboo
Business Intelligence Tools: Tableau,QlikView, Pentaho, Zoom Data, Power BI
Operating Systems: Linux (Red Hat 5/6.5, CentOS 6/7, Ubuntu), Windows
Web Technologies: HTML, Servlets, JSP, Java Script
Cloud Technologies: AmazonWebServices(AWS),CDH5,HDP-2.5,Horton works Pivotal
Confidential, LongBeach, CA
- Involved in requirement analysis, architecture of the EDP, Designing and coding.
- Played a major role in building a Near-Real Time data pipeline that brings the data from the My-SQL server which is a transactional system to hadoop.
- Worked on logging framework around the data pipeline to Hadoop for future analysis.
- Responsible for data integritychecks, datavalidation checks and datacertification checks in order to provide the quality data for the downstream applications.
- Worked on implementing Spark streaming as the streaming data ingestion framework where the data is being pushed by Kafka as messenger service to HDFS.
- Worked on building Data-pipe line using Talend-Big data as an ETL tool, which involves custom Java code that validates the data in the pipe line.
- Writing SparkSQL, Hive&Pigscripts for analyzing and data processing.
- Involved in building validation framework around the data that is present in Enterprise Data Platform.
- Worked on ApacheKylin a cubeimplementation on hadoop for extreme large datasets.
- Developed more efficientbulk loading techniques with the huge sets of data to the Hbase&Phoenix tables more efficiently achieving industrial range bench mark.
- Worked on Mesa as programming language in order to perform data operations on Slate and Contour (Palantir-Technologies proprietary tools).
- Worked on setting up HDP cluster, optimizing and fine tuning the memory requirements of the various environments depending on the EnterpriseDataPlatform Requirements.
- Worked on bringing unstructureddata (Imagefile and Emaildata) to HDFS and automate the process in bringing the business value by making the claims process in much efficient way than the traditional systems.
- Worked on cleaning and parsing the different file formats such as Parquet, Avro, Csv, Fixedlength files.
- Worked on publishing, deploying the code to Nexus and scheduling the jobs through Autosys.
Environment: HDP-2.5, CDH-5.12, Kafka, Hive, Tez, Hbase, Apache Phoenix, Apache Kylin, Talend-Big Data, Spark-SQL, Apache Zeppelin,Apace-Pig, Zookeeper, Maven, Gitblit.
Confidential . Sunnyvale,CA
- Played a key role in discussing about the requirements, analysis of the entire system along with estimation, development and testing accordingly keeping BI requirements as a note.
- Involved in developing and installations of Sqoop, Hive and FTP integrations to down systems.
- Performed importing data from various sources to the Cassandra cluster using Sqoop.
- Did a Sqoop job to help the import of data which is in different formats such as Avro, XML obtained from different vendors.
- Involved in writing MapReducejava scripts to process the data fromHBasetables.
- Developed UNIX shell scripts for the business process and assimilation of data from different interfaces.
- Developed Sqoop scripts for writing the processed data into HBase tables which helps BI team for the datavisualization.
- Established an Oozie component to implement a job scheduler which should occur on a daily basis.
- Also involved in developing a pipe line to load the data into tables using Sparkstreaming and Kafkawhich isintegrated with ZooKeeper.
- Involved in developing Sqoopscripts that load the data from different interfaces to HDFS.
- Developed Scala code for reading multiple data formats on HDFS.
- Worked on debugging and performance tuning on MapReduce, HIVE and Sqoop jobs.
- Involved in diagnosing different possible ways to optimize and improve the efficiency of the system.
- Developed multiple POC’s using Scala which are deployed on the cluster in turn compared the performance of Spark with Map Reduce.
- Developed Spark code using Scala for generating Spark-RDD seeds for faster transformations.
- Involved in creating and maintaining of the technical documentation for the MapReduce, Hive, Sqoop, UNIX jobs along with Hadoop clusters and also reviewing them to fix the post production issues.
Environment: Red Hat Enterprise Linux,HBase,Solr, Kafka,Map Reduce, Hive, Java SDK, Python, DB2, SQOOP, Spark, Scala, SBT,Akka,Maven,Solr,Github.
- Involved in setting up the Hadoop clusters, configuration, monitoring and performance tuning.
- Installed and configured various Hadoop components such as Hive, Pig, Sqoop, Oozie.
- Played a key role in developing and designing data management system using MySQL and worked on CSV file and JSONwhile retrieving the data.
- Hands on experience on cloud services likeAWS.
- Managed and developed workflow in Oozie which automates the data loading into HDFS and pre-processing with Pig.
- Analyzing the data by performing bucketing and partitioning in Hive and also by writing the Pig Latin scripts to know the customer behavioral pattern.
- Developed both frontend and backend modules using Python and ASP.net web frame work.
- Mentored in developing Hive queries and also connecting the Hive tables by ODBC connector to instantiate the report data to the front end.
- Implemented Storm integration with Kafka and ZooKeeper for the processing of real time data.
- Played a key role in exporting and analyzing the data from the relational databases and generating a report for the BI team.
- Worked on importing some of the data from NoSQL databases including HBASE, Cassandra.
- Monitoring the system health, reports and logs in process to act swiftly in the terms of failure and also alerting the team regarding the failures.
Environment: Amazon web Services, MapReduce,Cassandra, Apache Solr, Hive, Java, Pig, Sqoop, Azure, XML, Hbase, ASP.Net, Python, Pig-Latin.
Confidential, Englewood, Co
- Played a key role in installation managing and the support Linux Operating Systems such as CentOs and Ubuntu.
- Implemented in setting up the Amazon web services, Clusters, and data buckets on S3 with Hive scripts to process Big Data.
- Installed and monitored HadoopCloudera CDH5 and setting up Hadoopcloudera distribution system.
- Played a key role in implementing single sign on solutions using Kerberos.
- Played an important role in extracting the data from different data sources and transferring or loading them to Hadoop clusters.
- Played a major role in writing complex Hive queries in order to advance the data from the databases to Hadoopdistributedfilessystem.
- Involved in creation of automation job to process the data from different data sources to the cluster using Oozie.
- Setup and monitor the development and production environment.
Environment: Hadoop, Oracle 11g, Maven, Pig, JUnit, XML, Hive, HTML, Eclipse IDE, Oracle, AWS, JDBC, SQL.
- Played an important role in gathering all the requirements for the project.
- Developed UIscreens using struts tags in JSP.
- Extended standard action classes provided by the struts framework to handle the client request.
- Developed web services client interface with JAX-RPC from WSDL files for invoking the methods using SOAP.
- Developed RESTfulweb services for the downstream systems.
- Oracle 11g was used as the database for the application which was hosted in the cloud using Amazon web services.
- Developed the application using Eclipse and used it for editing, debugging, and formatting and build automation.
- Used Ajax to provide ease to the user by providing the data while filling the forms in the application.
- Developed ANT scripts for the build process and deployed in Web Logic Server.
- Development and implementation of tool.
- Developed JSP custom tags to display data.
- Version maintenance using CVS.
- Implemented UNIX shell scripting.
- Coordinated with the testing team to find and fix the bugs before production.
- Implemented test cases and the test the changed application.
Environment: Java, J2EE, Servlets, Struts, Junit, EJB, BEA Weblogic, JDBC, SQL, UNIX