SUMMARY
- 3 years of experience in setting up Big Data environment. Provided proof of concept using HDFS, Map Reduce, Pig, Hive, HBase, and Sqoop.
- Extensively worked on capacity planning for Hadoop cluster and setting up complete environment.
- Excellent in writing shell Scripts and Sqoop scripts to migrate data in and out of HDFS and Spark, Pig and Hive scripts for data processing.
- Experience in supporting data analysis projects by using Elastic MapReduce on the Amazon Web Services (AWS) cloud. Performed Export and import of data into S3.
- Capable of processing large sets of structured, semi structured, unstructured data and supporting systems application architecture.
- Familiar with data architecture including data ingestion pipeline design, Hadoop information architecture, data modeling and data mining.
- Experience optimizing ETL workflows.
- Good team player, strong interpersonal and communication skills combined with self - motivation, initiative and the ability to think outside of the box
TECHNICAL SKILLS
Big Data Ecosystem: Hadoop, MapReduce, HDFS, HBase, Hive, Pig, Sqoop, Spark, Python, Cloudera 6.2.x
ETL Tools: SSIS, Knime
Languages: Spark, Python, C/C++
Data Management: MS SQL, PL/SQL, NoSQL, Oracle, Hadoop, Sqoop, Hive
Web Technologies: HTML, CSS
Visualization/Others: MS EXCEL, Minitab, KNIME
Tools: Eclipse IDE, Pycharm, MS Visio, SQL Server Management Studio, Minitab, Excel
Platforms: Windows, Linux, VMware
Unix Tools: Shell scripts
PROFESSIONAL EXPERIENCE
Confidential
Hadoop developerResponsibilities:
- Configured Scoop Client to import of data into HDFS from external Teradata / Oracle relational database management system
- Involved in Data modeling sessions to develop models for Hive tables
- Created HBase tables to load large sets of structured, semi structured and unstructured data coming from NoSQL and a variety of portfolios.
- Worked on Hive partition and bucketing concepts and created hive External and Internal tables with Hive partition
- Worked on developing applications in Hadoop Big Data Technologies-Pig, Hive, Spark python
- Designed and Developed batch processing of Data ingested from various sources using Apache Sparks. Extensively worked with Spark SQL.
- Designed and Developed Data Pipelines (ETL processes) in Apache Spark using Spark SQL and Spark Streaming
- Created and Managed storage format strategies and Data Partitioning strategies for Hive tables
- Involved in Loading process into the Hadoop distributed File System and Pig in order to preprocess the data
- Collaborated with infrastructure, network, database, application and BI teams to ensure data quality and availability
- Created reports for BI team using Sqoop to export data into HDFS and Hive
- Designed and Developed data ingestion framework to ingest streaming log and reference data from Customer Center Applications and Databases into HDFS using Sqoop and Spark Streaming
- Worked on loading data from Cassandra Database to HDFS Configure Hadoop environment in cloud through Amazon Web Services (AWS) and to provide a scalable distributed data solution
- Having experience on spark performance tuning Options
- Experience working with Spark analytics using spark SQL
- Administrator for Pig, Hive and Hbase installing updates, patches and upgrades
Environment: Hadoop Ecosystem, Spark, Python, Pig, Hive, Cloudera 6.2.x, Sqoop, Oracle sql server, Unix
Confidential
Hadoop developer
Responsibilities:
- Loading data from RDBMS systems to HDFS using Sqoop
- Monitored and fine-tuned Map-Reduce programs running on the cluster
- Solved performance issues in Hive and pig with understanding of Joins, Group and aggregation and how does it transfer to Map-Reduce
- Created Hive queries that helped market analysts spot emerging trends by comparing fresh data with EDW reference tables and historical metrics.
- Designed and Developed Data Pipelines (ETL processes) in Apache Spark using Spark SQL and Spark Streaming
- Extended Hive using custom scripts for data transformation and writing User Defined Functions
- Extensively worked own troubleshooting Hive jobs
- Used Flume to collect, aggregate and push log data from different log servers
- Supported code/design analysis, strategy development and project planning
- Developed multiple MapReduce jobs in python for data cleaning and perspective
- Managed and reviewed Hadoop log files
- Tested raw data and executed performance scripts
- Shared responsibility for administration of Hadoop, Hive, Sqoop and Spark, Python
- Experienced in defining job flows
Environment: Hadoop Ecosystem, Spark, Python, Pig, Hive, Cloudera 6.2.x, Sqoop, Microsoft SQL Server (SSIS,SSRS, SSAS), SQL,PL/SQL