Sr. Big Data/hadoop Developer Resume
Livonia, MI
SUMMARY
- Over 7+ years of experience in IT industry with strong emphasis on Object Oriented Analysis, Design, Development and Implementation, Testing and Deployment ofBigdataSoftware Applications and Web enabled applications.
- Experienced with Hadoop and Hadoop Ecosystems such as HDFS, MapReduce and Hadoop Ecosystem (Pig, Hive, Hbase, Sqoop, Zookeeper, Oozie).
- Excellent Hands on Experience in developing Hadoop Architecture within the project in Windows and Linuxplatforms.
- Expertise in writing Java Map Reduce Jobs, HIVEQL for Data Architects,Data Scientists.
- Experienced indataloading from Oracle and MYSQL databases to HDFS system using Sqoop (StructureData) and Flume (LogFiles&XML)
- Experienced in optimizing MapReduce algorithms using Mappers, Reducers, combiners and partitioners to deliver the best results for the large datasets.
- Experienced in working with Oracle, DB2, SQLServer and Java concepts like Multithreading, Collections, OOPS and IO operations.
- Excellent understanding of NoSQLDatabases and hands on experience in writing applications on NoSQL databases like Cassandra and Mongo DB.
- Experienced in Software Development Life Cycle (Requirements Analysis, Design, Development, Testing, Deployment and Support)
- Expertise with SQL, PL/SQL and database concepts.
- Experienced in working with differentdatasources like Flat files, XML files and Databases.
- Expertise in programming and data mining with R/Python/Java/Scala
- Hands on experience in advanced Big - Data technologies like Spark Ecosystem(Spark SQL, MLlib, SparkR and Spark Streaming), Kafka and Predictive analytics (MLlib, R ML packages including Oxdata’s ML library H2O).
- Experienced with Hadoop and QA to develop test plans, test scripts and test environments and to understand and resolve defects.
- Experienced in Database development, ETL and Reporting tools using SQL Server DTS, SQL, SSIS, SSRS, Crystal XI & SAP BO.
- Experienced in creating complex SQL Queries and SQL tuning, writing PL/SQL blocks like stored procedures, Functions, Cursors, Index, triggers and packages.
- Expertise in cross-platform (PC/Mac, desktop, laptop, tablet) and cross-browser (IE, Chrome, Firefox, Safari) development.
- Passionate for Big data Wrangling/Predictive Modeling/Visualizing in an interactive/meaningful way to drive business values effectively.
PROFESSIONAL EXPERIENCE
BigData Technologies: Hadoop, HDFS, Hive, Map Reduce, Pig, Sqoop, Flume, Zookeeper, Oozie, Avro, HBaseLanguages: Java, FoxPro, Linux Script, SQL, Java, C, R, Python, and Scala
Web Technologies: ASP, HTML, XML, JavaScript, JSON
IDE/Tools: Eclipse, VMware, Apache, VSS, TFS 2008, Visio
GUI: Visual Basic 6.0, Oracle, MS Office (Word, Excel, Outlook, PowerPoint Access)
Browsers: Google Chrome, Mozilla Fire Fox, IE8
Reporting Tools: Crystal XI, SAP BO 4.1, Dashboard, Info View
DB Languages: MSSQL, MSAccess, MySQL, Pervasive SQL & Oracle
Operating Systems: Windows XP, 7, 8, LINUX/Unix
PROFESSIONAL EXPERIENCE
Confidential, Livonia, MI
Sr. Big Data/Hadoop Developer
Responsibilities:
- Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, OozieZookeeper and Sqoop.
- Created POC to store Server Logdatain MongoDB to identify System Alert Metrics.
- Developing Machine Learning driven models using R and Spark MLlib.
- Responsible for smooth error-free configuration ofDWH-ETL solution and IntegrationwithHadoop
- Configured Cassandra&Flume to the existing Hadoopcluster.
- Implemented Hadoopframework to capture user navigation across the application to validate the user interface and provide analytic feedback/result to the UI team.
- Developed Python scripts to automate routine report generation for JSTOR usage data.
- Loadeddatainto the cluster from dynamically generated files using Flume and from relational database management systems using Sqoop.
- Performed analysis on the unused user navigationdataby loading into HDFS and writing Map Reduce jobs.
- Extensively involved in Installation and configuration of Clouderadistribution Hadoop, NameNode, Secondary NameNode, JobTracker, TaskTrackers and DataNodes.
- Worked with Cassandra for non-relationaldatastorage and retrieval on enterprise use cases.
- Wrote MapReduce jobs using Java API and Pig Latin.
- Converting procedures, functions and UNIX scripts to Greenplumfunctions
- Loaded thedatafrom Teradata to HDFS using TeradataHadoop connectors.
- Wrote Pig scripts to run ETL jobs on thedatain HDFS.
- Used Hive to do analysis on thedataand identify different correlations.
- Worked on importing and exportingdatafrom Oracle and DB2 into HDFS and HIVE using Sqoop.
- Worked on NoSQL databases including HBase and MongoDB. Configured MySQL Database to store Hivemetadata.
- Importeddatausing Sqoop to loaddatafrom MySQL to HDFS on regular basis.
- Implementeddataloading using Spark, Storm, Kafka, Elastic Search, Logstash, Kibana Redis, Flume, from external CSV, text files to Netezza &greenplumdatabase as rawdata .
- Extracted files from MongoDB through Sqoop and placed in HDFS and processed.
- Maintaining and monitoring clusters.
Environment: Hadoop, Map Reduce, HDFS, Flume, Pig, Hive, HBase, Sqoop, ZooKeeper, Cloudera, Oozie, MongoDB, Cassandra, SQL*PLUS, NoSQL, ETL, MYSQL, agile, Windows, UNIX Shell Scripting.
Confidential, Tampa FL
Sr. Big Data/Hadoop Developer
Responsibilities:
- Developed and Supported Map Reduce Programs those are running on the cluster.
- Created Hive tables and working on them using Hive QL.
- Involved in installing Hadoop Ecosystem components.
- Validated Namenode, Datanode status in a HDFS cluster.
- Loaded home mortgagedatafrom the existingDWHtables (SQL Server) to HDFS using Sqoop.
- Configured Spark streaming to receive real timedatafrom theKafkaand store the streamdatato HDFS using Scale.
- Used to manage and review the Hadoop log files.
- Responsible to managedatacoming from different sources.
- Created clusters in EMR using applications hive, spark, hue, Zeppelin-sandbox, Ganglia, Presto-sandbox and scale up the nodes. Currently, I am able to launch the automated cluster by using Python Scripts.
- Supporting Hbase Architecture Design with the Hadoop to develop a Database Design in HDFS.
- Supported Map Reduce Programs those are running on the cluster.
- Involved in HDFS maintenance and loading of structured and unstructureddata.
- Worked on analyzingHadoopcluster and differentBigDataComponents including Pig, Hive, Spark, HBase,Kafka, Elastic Search, Logstash, Kibana database and SQOOP
- Wrote MapReduce jobs using Java API.
- Migrationof 100+ TBs ofdatafrom different databases (i.e. Netezza, Oracle, SQL Server) to Hadoop.
- Exporteddatausing Sqoop from HDFS to Teradata on regular basis. Developed Hive,Pig,Sqoop,Flume,Kafka,Falcon,Zookeeper,Yarn,Oozie on hadoop cluster.
- Copied the data from HDFS to MONGODB using pig/Hive/Map reduce scripts and visualized the streaming processed data in Tableau dashboard.
- Wrote Hivequeries fordataanalysis to meet the business requirements.
- Developed UDFs for PigDataAnalysis.
- Developed Scripts and Batch Job to schedule various Hadoop Program.
- Upgraded the Hadoop Cluster from CDH3 to CDH4 and setup High availability Cluster Integrate the HIVE with existing applications
- Handled importing of data from various data sources,performed transformations using Hive, MapReduce.
- Analyzed thedataby performing Hive queries and running Pig scripts to know user behavior.
- Creating User defined function for loading ofdatafrom SQL Server toGreenplum.
- Installed Oozieworkflow engine to run multiple Hive and Pig jobs.
- Developed Hive queries to process the data and generate the data cubes for visualizing.
- Optimized the mappings using various optimization techniques and also debugged some existing mappings.
- Developed HiveQL scripts to manipulate thedatain HDFS.
- Worked on the Hbase architecture design for the Loan Volumes in HDFS.
- Installed and configured Hive and also written Hive, HDFs.
Environment: Java, Hadoop, MapReduce, HDFS, Hive, Pig, Linux, XML, Eclipse, Cloudera, CDH3/4 Distribution, SQL Server, Oracle 11i, MySQL
Confidential, Golden Valley, MN
Hadoop Developer
Responsibilities:
- Supported Map Reduce Programs those are running on the cluster.
- Implemented solutions for ingesting data from various sources and processing the Data-at-Rest utilizing BigData technologies such asHadoop, MapReduce Frameworks,HBase, Hive, Oozie, Flume, Sqoop etc.
- Imported Bulk Data into HBase Using MapReduce programs.
- Developed and written ApachePIG scripts and HIVEscripts to process the HDFS data.
- Perform analytics on Time Series Data exists in HBase using HBaseAPI.
- Designed and implemented incremental imports into Hive tables.
- Involved in collecting, aggregating and moving data from servers to HDFS using Apache Flume.
- Written Hive jobs to parse the logs and structure them in tabularformat to facilitate effective querying on the log data.
- Wrote multiple java programs to pull data from Hbase.
- Experience in optimization of Mapreduce algorithm using combiners and partitions to deliver the best results and worked on Application performance optimization for a HDFScluster.
- Implemented business logic by writing UDFs in Java and used various UDFs from Piggybanks and other sources.
- Worked on debugging, performance tuning of Hive&PigJobs
- Created and maintained Technical documentation for launchingHADOOPClusters and for executing Hive queries and PigScripts.
Environment: Java,Hadoop2.1.0, Map Reduce2, Pig 0.12.0, Hive 0.13.0, Linux, Sqoop 1.4.2, Flume 1.3.1, Eclipse, AWS EC2, and Cloudera CDH 4
Confidential, Wellesley, MA
Hadoop Developer/Data Analyst
Responsibilities:
- Involved in identifying the source data from different systems and map the data into the warehouse.
- Monitored the AWS Hadoop cluster using Cloudera manager for adding nodes and decommission dead nodes and to monitor heal checks.
- Configured and monitored MongoDB cluster in AWS and establish connections from Hadoop to MongoDB data transfer.
- Used ScalaAPI for programming in ApacheSpark.
- Connected Tableau from client end with AWS ip addresses and viewed the end results.
- Installed KAFKA on Hadoop cluster and configured producer and consumer coding part in java to establish connection from twitter source to HDFS.
- Copied the data from HDFS to MONGODB using pig/Hive/Mapreducescripts and visualized the streaming processed data in Tableau dashboard.
- Developed shell scripts and made the process automatic to drive the process from JSON to BSON.
- Used Kafka to stream the data with twitter4j from source to Hadoop.
- Offline Analysis was performed on HDFS and sent the results to MongoDB databases to update the information on the existing table, From Hadoop to MongoDB move was done using Mapreduce, Hive/ Pigscripts by connecting with Mongo-Hadoop connectors.
- Used security system like Kerberos
- Responsible for start to end process of Hadoop cluster installation, configuration and monitoring
- Responsible for building scalable distributed data solutions using Hadoop
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster.
- Conductedresearch and analysis inCapital Market consisting of Stock Marketand Bond/Market
- Developed simple to complex Map/reduce Jobs using Hive and Pig
- Applied statistics and machine learning algorithms on distributed architectures using Mahout/R.
- Extracted meaningful data from unstructured data on Hadoop Ecosystem.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and extracted the data from MySQL into HDFS using Sqoop
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required
Environment: Hadoop, HBase, HDFS, Map Reduce, Cloudera, Ganglia, Pig Latin, Sqoop, Hive, pig, MySQL, Oozie, Flume, Zookeeper, R, and Python.
Confidential, Louisville, KY
Hadoop Administrator/Developer
Responsibilities:
- Start to end process of Hadoop cluster installation, configuration and monitoring.
- Build scalable distributed data solutions using Hadoop
- Installed and configured Hive, Pig, Sqoop, Flume and Oozie on the Hadoop cluster
- Involved in the creation ofGreenplum functions and views for the customer use
- Developed Simple to complex Map/reduce Jobs using Hive and Pig
- Applied statistics and machine learning algorithms using Mahout/R
- Extracted meaningful data from unstructured data on Hadoop Ecosystem.
- Handled importing of data from various data sources, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
- Worked with application teams to install operating system, Hadoop updates, patches, version upgrades as required.
Environment: Hadoop,HBase, HDFS, Map Reduce, Cloudera, Ganglia, Pig Latin, Sqoop, Hive, pig, MySQL, Oozie, Flume, Zookeeper.