- 5+ years of IT experience that includes Data Analysis and Hadoop Ecosystem
- Experience in components of Hadoop ecosystem including HDFS, MapReduce, Sqoop, Hive, Pig, HBase, Oozie, Flume, Kafka, Zookeeper, and Spark.
- Expertise in Hadoop Ecosystem, HDFS Architecture and Cluster technologies such as YARN
- Management, HDFS, HBase, MapReduce, Hive, Pig, Flume, Oozie, Sqoop, Zookeeper and Ranger
- Experience in developing software solutions to build out capabilities on a Big Data Platform
- Experience in configuring cluster and installing the services, monitoring the cluster by eliminating the compatibility errors
- Experience in different Hadoop distributions like Cloudera and HortonWorks Distributions (HDP)
- Highly capable of processing large sets of Structured, Semi - structured and Unstructured datasets and supporting Big Data applications.
- Experience with NoSQL databases like HBase, MapR and Cassandra as well as other ecosystems like Zookeeper, Oozie, Impala, Storm, Spark- Streaming/SQL, Kafka, Hypertable, Flume
- Expertise in transferring data between a Hadoop ecosystem and structured data storage in a RDBMS such as MYSQL, Oracle, Teradata and DB2 using Sqoop.
- Good experience with Hive concepts like static/dynamic partitioning, bucketing, managed, and external tables, join operations on tables.
- Proficient in building user defined functions (UDFs) in Hive and Pig, to analyze data and extended HiveQL and Pig Latin functionality.
- Experience in implementing unified data ingestion platform using Kafka producers and consumers.
- Proficient with Flume topologies for data ingestion from streaming sources into Hadoop
- Has very good development experience with Agile Methodology.
- Strong experience in distinct phases of Software Development Life cycle (SDLC) including Planning, Design, Development and Testing during the development of software applications.
- Excellent leadership, interpersonal, problem solving and time management skills.
- Excellent communication skills both written (documentation) and verbal (presentation).
- Very responsible and good team player. Can work independently with minimal supervision.
Languages: C, C++, Java (Core), J2EE, Asp.Net, Python, Scala, UNIX Shell Scripting
Hadoop Ecosystem: MapReduce, HBASE, HIVE, PIG, SQOOP, Zookeeper, OOZIE, Flume, HUE, Kafka, SPARK-SQL
Hadoop Distributions: Cloudera, Hortonworks, MapR
Database: MySQL, NoSQL, Oracle DB, Cassandra
Virtualization / Cloud: Amazon AWS, VMware, Virtualbox
Data Visualization: Power BI, Tableau
IDE: Eclipse, Net Beans, VisualStudio
Methodologies: Agile, SDLC
Hadoop Data Lake Developer
Confidential, Cleveland, OH
- Understanding the scope of the project and requirements gathering
- Using MapReduce to Index the large amount of data to easily access specific records
- Loading log data into HDFS using Flume
- Creating MapReduce jobs to power data for search and aggregation
- Writing Apache PIG scripts to process the HDFS data
- Writing Mapreduce Code for filtering data
- Creating Hive tables to store the processed results in a tabular format
- Developing Sqoop scripts to make the interaction between Pig and Oracle
- Writing script files for processing data and loading to HDFS.
- Working with Sqoop for importing data from Oracle
- Utilizing Apache Hadoop ecosystem tools like HDFS, Hive and Pig for large datasets analysis
- Developing Pig and Hive UDF to analyze the complex data to find specific user behavior
- Using Pig for data cleansing and developed Pig Latin scripts to extract the data from web server output files to load into HDFS
- Developing MapReduce ETL in Java/Pig and data validation using HIVE
- Working on Hive by creating external and internal tables, loading it with data and writing Hive queries.
- Creating HBase tables to store data from various sources
- Developing workflow in Oozie to automate the tasks of loading data into HDFS and pre-processing with Pig and Hive
- Working with various Hadoop file formats, including Text, Sequence File, RCFILE and ORC File.
- Configured Zookeeper for Cluster coordination services
Environment: Hadoop, MapReduce, HDFS, Pig, Hive, HBase, Zookeeper, Flume, Kafka, Spark, Elastic Search, Oozie, Java(jdk1.6), Cloudera, Oracle 11g/10g, Windows, UNIX Shell Scripting.
Confidential, New York
- Conducted research in Social Media Analytics
- Involved in collecting, processing, analyzing and reporting social media data of specific research topic
- Explored Social Media Analysis on Community Development Practices based on the results from R
- Performed data mining, data cleaning & explored data visualization, techniques on a variety of data stored in spreadsheets and text files using R and plotting the same using with R packages
- Hands-on statistical coding using R and Advanced Excel
Environment: R-Studio, RPubs, Java (v1.8), ShinyApps, Excel
- Worked on Hadoop Ecosystem using different big data analytic tools including Hive, Pig.
- Involved in loading data from LINUX file system to HDFS.
- Importing and exporting data into HDFS and Hive using Sqoop.
- Implemented Partitioning, Bucketing in Hive.
- Experienced in running Hadoop Streaming jobs to process terabytes of json format data.
- Involved in scheduling Oozie workflow engine to run multiple Hive and Pig jobs
- Executed Hive queries on Parquet tables stored in Hive to perform data analysis to meet the business requirements.
- Created HBase tables to store various data formats of incoming data from different portfolios.
- Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
- Developed the verification and control process for daily load.
- Experience in Daily production support to monitor and trouble shoots Hadoop/Hive jobs.
- Worked collaboratively with different teams to smoothly slide the project to production.
Environment: HDFS, Pig, Hive, Sqoop, Shell Scripting, HBase, Zoo Keeper, MySQL.
- Performed analysis for the client requirements based on detailed design documents
- Developed Use Cases, Class Diagrams, Sequence Diagrams and Data Models using Microsoft Visio
- Developed STRUTS forms and actions for validation of user request data and application functionality
- Developed a WebService using SOAP, WSDL, XML and SoapUI
- Involved in developing business tier using stateless session bean
- Designing the database and coding of SQL, PL/SQL, Triggers and Views using IBMDB
- Design patterns of Delegates, Data Transfer Objects and Data Access Objects
- Developed Message Driven Beans for asynchronous processing of alerts
- Used ClearCase for source code control and JUNIT for unit testing
- The networks are simulated in real-time using an ns3 network simulator modified for multithreading across multiple cores, which is implemented on generic Linux machine
- Involved in peer code reviews and performed integration testing of the modules