- Experienced in Hadoop eco systems HDFS, PIG, Hive, Sqoop, and HBase to processing large set of structured, semi - structured data for analysis and creating report for business decision.
- Extensive experience in ETL and Big data query tools like Pig Latin and Hive QL.
- Developed Pig scripts to pre-processed the raw data and move into HDFS and created Hive ad-hoc queries for business user for further analysis.
- Good understanding of Partitions and Bucketing concepts in Hive and designed both Managed and External tables in Hive to optimize performance.
- Worked with NoSQL database HBase and perform data analysis with HBase using Hive external table to HBase.
- Highly proficient in Python programing, R for visualization and Linux shell scripting to design automation script for Import, Export, Load and Backup jobs.
- Schedule Cron Job for Unix, Hdfs, Pig and Hive to automate the analysis process.
- Worked as a part of SDLC and perform requirement analysis, writing a test strategy and test cases for ETL, Web and Mobile application and reported an issue using defect tracking and reporting tool like Jira.
- Extensive experience in data analytics using Excel, Excel solver and Google analytics also have working experience with database MySql.
- Extensive experience in writing SQL quire to retrieve the data for further analysis.
- Proficiency in waterfall and agile (scrum Model) methodologies of software development cycle.
Big Data Eco System: HDFS, MapReduce, Pig, Hive, Sqoop, Oozie, Zookeper, Flume, HBase, Storm, Spark
Big Data design patterns: Data Acquisition and Ingestion, Data Analytics, Data Storage, Data Management
Databases: Db2 SQL, MySQL
Languages: Python, Shell, R, SQL, Pig Latin, HiveQL,Scala.
Office Tools: Microsoft Office Suite, FileZilla, VPN Putty, @Risk, Excel solver
Operating Systems: Windows, Linux
Development Tools: MySQL, Pycharm
Quality related tools: HP Quality center, Jira
Development Methodologies: Agile/Scrum, Waterfall
Confidential, Chicago, IL
- Translate complex functional and technical requirements into detailed design to understand the business criteria for ad-hoc requirement.
- Developed data pipeline using Sqoop, Pig and Hive to ingest customer data, historical data and aggregating large amounts of log data into HDFS to study customer behavior and sales trend.
- Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregations and store in HDFS for further analysis and business decision making.
- Worked with Managed and External Hive tables and performed Partitions, bucketing concept in Hive to optimize performance and created ad-hoc quire for business user to optimize their sale and marketing initiatives.
- Use statistical scripting language R to fetch the data for visualization and to generate the report with different trend for BI team.
- Automating and scheduling the Sqoop jobs in a timely manner and ETL job using Unix Shell Scripts.
- Worked on debugging, performance tuning of Hive & Pig Jobs.
- Worked in Agile/Scrum methodologies and communicate in scrum meeting for project analysis specs and development aspects.
Environment: Apache Hadoop, MySQL, Linux, Windows, UNIX, Sqoop, Hive,PIG, Hbase, Python
Confidential, Milwaukee, WI
- Based on business requirement, developed Pig queries with Joins and aggregation to implementing the business rules and transformations.
- Develop HIVE queries for the analysis, to categorize by various factor and then analyze the partitioned and bucketed data and compute various metrics for reporting.
- Connected hive table to tableau for data visualization and generating report to support a business team for making a business decision.
- Developed several shell scripts, which acts as wrapper to start these Hadoop jobs and set the configuration parameters.
- Implemented shell scripts to run the Cron jobs for automate the data migration process from external servers and FTP site.
- Involve in Sentiment Analysis on reviews of the products on the client's website and then exported the resulted sentiment analysis data to Tableau for creating dashboards.
Environment: Hadoop, MapReduce, HDFS, Hive, Java, SQL, Sqoop, CentOS.
Confidential - Columbus, OH
- Interacted Business Analysts for collecting, understating the business requirements to build a data structure for customer 360-degree view by enriching the data from various departments
- Worked with testing team to support ETL Jobs and process testing, provided required test cases.
- Develop ETL test strategy, ETL Mapping document and SQL queries to support ETL testing.
- Automation of Initial and incremental Loading of data using sequencers with required dependencies according to Business requirements.
Environment: Linux, Unix, SQL, MySQL, Shell Script, QC
Confidential, Aurora, IL
- Took an active part in all stages of Agile Software Development Life Cycle as QA perspective right from walkthrough’s of business requirements and analyzing functional designs to maintenance of completed product.
- Participated in team meetings to discuss the testing process in order to complete the testing activity prior schedule.
- After register/updated/remove patient information, use the SQL queries to make sure that the database reflects the change.
- Involved in design, analysis, testing and implementation of data warehousing.
- Develop ETL test plans based on test strategy. Created and executed test cases and test scripts based on test strategy and test plans based on ETL Mapping document.
- Expertise in using Jira to perform activities like tracking defect and reporting bug.
- Performed extensive GUI, Functional, Integration, and Regression User Acceptance testing manually.
Environment: Windows XP, SQL Server, Oracle, MS Excel, Quality Centre.