- 6+ years of experience in Analysis, Architecture, Design, Development, Testing, Maintenance and User training of software application. Experience in developing Map Reduce Programs using Apache Hadoop for analyzing the big data as per requirement.
- Good working knowledge on Data Transformations and Loading using Export and Import. Hands on experience using Sqoop to import data into HDFS from RDBMS and vice - versa.
- Experience in analyzing data using Hive, Pig Latin, and custom MR programs in Java. .
- Hands on using job scheduling and monitoring tools like Kafka, Oozie and Zookeeper.
- Developed small distributed applications in our projects using Zookeeper and scheduled the work flows using Oozie.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation.
- Clear understanding on Hadoop architecture and various components such as HDFS, Job and Task Tracker, Name and Data Node, Secondary Name Node and Map Reduce programming.
- Expertise writing custom UDFs for extending Hive and Pig core functionality.
- Hands on dealing with log files to extract data and to copy into HDFS using flume.
- Experience in NOSQL database such as HBase.
- Experience in Hadoop administration activities such as installation and configuration of clusters using Apache and Cloudera.
- Knowledge on installing, configuring and using Hadoop components like Hadoop Map Reduce (MR1), YARN (MR2), HDFS, Hive, Pig, Flume and Sqoop.
- Experience in Dimensional Data Modeling using Star and Snow Flake Schema.
- Extensive experience working in Oracle, DB2, SQL Server and My SQL database and Java Core concepts like OOPS, Multithreading, Collections and IO.
- Developed applications using Java, RDBMS, and Linux shell scripting.
- Experience in complete project life cycle of Client Server and Web applications.
- Good understanding of Data Mining and Machine Learning techniques.
- Have good interpersonal, communicational skills, strong problem solving skills, explore/adopt to new technologies with ease and a good team member.
Big Data Technologies: HDFS, MapReduce, Hive, Pig, Sqoop, Oozie, Flume, Impala, Zookeeper, Kafka, Cloudera CDH5.5
Programming Languages: Java, SQL, PLSQL, Pig Latin, Hive-QL and UNIX shell scripting
Databases: Oracle, MySQL, SQL Server, Familiar with NoSQL (HBase).
Web Technologies: JavaBeans, JDBC, HTML, CSS, XML, Java Script.
Operating Systems: Windows, UNIX, Linux distributions (Centos, Ubuntu)
Confidential, Parsippany, NJ
- Responsible for building scalable distributed data solutions using Hadoop.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop.
- Implementing MapReduce programs to analyze large datasets in warehouse for business intelligence purpose. Used default MapReduce Input and Output Formats.
- Developed HQL queries to implement the select, insert, update and operations to the database by creating HQL named queries.
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Developed simple to complex Map/Reduce jobs using Java, and scripts using Hive and Pig.
- Analyzed the data by performing Hive queries (HiveQL) and running Pig scripts (Pig Latin) for data ingestion and egress.
- Implemented business logic by writing UDFs in Java and used various UDFs from other sources.
- Experienced on loading and transforming of large sets of structured and semi structured data.
- Managing and Reviewing Hadoop Log Files, deploy and Maintaining Hadoop Cluster.
- Export filtered data into HBase for fast query.
- Involved in creating Hive tables, loading with data and writing Hive queries.
- Created data-models for customer data using the Cassandra Query Language.
- Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.
- Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
- Supported in setting up QA environment and updating configurations for implementing scripts with Pig, Hive and Sqoop.
Environment: Apache Hadoop (Cloudera), Hbase, Hive, Pig, Map Reduce, Sqoop, Oozie, Eclipse, java.
Confidential - Orlando, FL
- Developed MapReduce programs to parse the raw data, populate staging tables and store the refined data in partitioned tables in the EDW.
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Enabled speedy reviews and first mover advantages by using Oozie to automate data loading into the Hadoop Distributed File System and PIG to pre-process the data.
- Provided design recommendations and thought leadership to sponsors/stakeholders that improved review processes and resolved technical problems.
- Managed and reviewed Hadoop log files. Tested raw data and executed performance scripts. Shared responsibility for administration of Hadoop, Hive and Pig. Built wrapper shell scripts to hold this Oozie workflow.
- Involved in collecting and aggregating large amounts of log data using Apache Flume and staging data in HDFS for further analysis.
- Provided ad-hoc queries and data metrics to the Business Users using Hive, Pig.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS. Used Pig as ETL tool to do transformations, event joins and some pre-aggregations before storing the data onto HDFS.
- Worked on Mapreduce Joins in querying multiple semi-structured data as per analytic needs. Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
- Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.
- Storing and loading the data from HDFS to Amazon S3 and backing up the Namespace data into NFS Filers.
- Created concurrent access for hive tables with shared and exclusive locking that can be enabled in hive with the help of Zookeeper implementation in the cluster.
- Wrote the shell scripts to monitor the health check of Hadoop daemon services and respond accordingly to any warning or failure conditions.
- Familiarity with NoSQL databases including HBase.
- Wrote shell scripts for rolling day-to-day processes and it is automated.
Environment: Hadoop, MapReduce, YARN, Hive, HDFS, PIG, Sqoop, Oozie, Cloudera, Flume, HBase, ZooKeeper, CDH3, Oracle, NoSQL and Unix/Linux, Kafka, Amazon web services.
Big Data/Hadoop Developer
Confidential - Kansas City, MO
- Involved in creating Hive tables, and loading and analyzing data using hive queries.
- Developed and executed custom MapReduce programs, Pig Latin scripts and HQL queries.
- Worked on importing the data from different databases into Hive Partitions directly using Sqoop.
- Performed data analytics in Hive and then exported the metrics to RDBMS using Sqoop.
- Involved in running Hadoop jobs for processing millions of records of text data.
- Extensively used Pig for data cleaning and optimization.
- Implemented complex map reduce programs to perform joins on the Map side using distributed cache.
- Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
- Responsible for troubleshooting issues in the execution of MapReduce jobs by inspecting and reviewing log files.
- Extracted Tables using Sqoop and placed in HDFS and processed the records.
Environment: CDH, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Unix.
Big Data/Hadoop Consultant
- Developed Big Data Solutions that enabled the business and technology teams to make data-driven decisions on the best ways to acquire customers and provide them business solutions.
- Developed various Map reduce programs to cleanse the data and make them consumable by hadoop.
- Migrated the existing data to Hadoop from RDBMS (Oracle) using Sqoop for processing the data.
- Worked with sqoop export to export the data back to RDBMS.
- Used various compression codecs to effectively compress the data in HDFS.
- Written Pig Latin scripts for running advanced analytics on the data collected.
- Created hive internal and external tables with appropriate static and dynamic partitions for efficiency.
- Used Avro SerDe's for serialization and de-serialization and also implemented hive custom UDF's involving date functions.
- Worked on a POC to benchmark the efficiency of Avro vs Parquet.
- Implemented the end to end workflow for extraction, processing and analysis of data using Oozie.
- Used various optimization techniques to optimize hive, pig and sqoop.
Environment: CDH, Hadoop, HDFS, Map Reduce, Hive, Pig, Sqoop, Unix.
- Complete analysis, requirement gathering and function design document creation.
- Written various SQL Statements for the purpose of application development.
- Responsible for the designing the advance SQL queries, procedure, cursor, triggers, scripts.
- Collaborated with the application developers in data modeling and E-R design of the systems.
- Also created, modified, maintained and optimized SQL server database.
- Maintained the documents and create reports for the business analysts and end-users.
- Responsible for the management of the database performance, backup, replication, capacity and security.
- Coordinating user acceptance testing amongst business users.
- Developed many SQL stored procedures, views, indexed views for business reporting.
Environment: SQL Server 2000/2005, SQL, PLSQL and Windows XP/2005.