Hadoop/spark Developer Resume
Jacksonville, FL
SUMMARY
- 8+ years strong skillset in building and developing Hadoop Map reduce solutions and experience in using Hive, Pig, Spark, Storm, Kafka.
- Strong Proficiency in R on concepts such as data transformation, filter and analytics
- Having experience onRDDarchitectureandimplementingspark operations on RDD and also optimizing transformations and actionsin spark.
- Good knowledge on spark components likeSpark Sql, MLib, Spark Streaming and GraphX
- Configured deployed and maintained multi - node Dev and TestKafkaClusters.
- Developed code to write canonical modelJSONrecords from various input sources toKafka Queues
- Well knowledgeable in building, configuring, monitoring, supporting Hadoop environment using Cloudera manager, Hortonworks, AWS, and apache Ambari.
- Involved and worked in managing, configuring, installing, supporting Cloudera Hadoop CHD 5 and IBM InfoSphere BigInsights.
- Experience in importing and exporting data using Flume, sqoop from RDBMS to HDFS and vice-versa.
- Developed various Pig and Hive UDFs(User Defined Functions) to extend functionality to solve multiple bigdata filtering problems.
- Implementing Hbase Row Key design and integration of Hive and Hbase
- Good knowledge in using AWS tools EC2, VPC, Route 53, Cloud Trail, Cloud Watch,IAM,S3
- Strong experience in developing Map Reduce solutions on Hadoop ecosystem that solved various bigdata problems.
- Strong experience in MapReduce programming and customizing framework at various levels and worked on various Input formats like SequenceFileInputFormat, KeyValue Pair Input Format etc.
- Strong experience in tweaking RecordReaders and developing custom Input Formats to perform MapReduce on highly unstructured data.
- Strong skillset in optimizing MapReduce code to improve performance by imposing compression techniques on intermediate data crated between map and reduce task.
- Worked on following NoSQL databases MongoDB, HBase and Cassandra and ensured faster access to data on HDFS.
- Strong knowledge in working with MongoDB database concepts locking, transactions, indexes and good in managing MongoDB environment for scalablity.
- Experience in web pages development using HTML,CSS,JAVA Script, D3 java Script, Knockout JS, AJAX, Underscrore JS.
- Proficiency in understanding various data formats that include XML, JSON and also comfortable with development of web services like REST.
- Strong background as a Java developer using Java, J2EE,JSP,JDBC, SQL, Hibernate.
- Good in using version control tools GITHUB and SVN.
- Strong Experience in using Unix commands and shell scripting.
- Experience in working with development tools ECLIPSE, NET BEANS, PyCharm, INTELLIJ, ANDROID STUDIO.
- Strong work experience in using MS Office, Excel tools for documentation and reporting.
TECHNICAL SKILLS
Big Data: Hadoop HDFS, Map Reduce V2, PIG, HIVE, HBase, Oozie, Spark, Kafka, Storm, ZooKeeper, Flume
Java Confidential: Core Java, J2EE Servlets, JSP, JDBS, JNDI, JAVA Beans, Hibernate, Java Script, JQuery, JDBC, Applets, Swings, Struts
AWS tools: EC2, VPC, Route 53, S3, IAM, Cloud Watch, Cloud Trail, Glacier, Elastic Search
Programming Languages: C, C++, R, Java, Python, Scala, UNIX Shell Scripting
Databases: Oracle 11g, MYSQL, DB 2, MY-SQL Server, Microsoft SQL server, MS Access
NoSQL Databases: MongoDB, HBase, Apache Cassandra, Amazon DynamoDB, Neo4j
IDE tools: Eclipse, NetBeans, PyCharm, IntelliJ, Android Studio
Virtualization Confidential: VMware ESX server, Microsoft Hyper-V Server, Citrix: Xen Server
Operating Systems: Linux, Unix, Windows XP/7/8/10, Windows Server 2003,2008 Mac, AMI
Web Confidential: HTML, CSS, Java Script
Data Visualization: D3, Tableau, R
Networking Protocols: TCP/IP, UDP,HTTP,HTTPS,FTP,SMTP,SNMP,POP3
Unit Testing Tools: Junit, TestNG
Version Control: GitHub, CVS, SVN
ETL Tools: Informatica, Pentaho
PROFESSIONAL EXPERIENCE
Confidential, Jacksonville, FL
Hadoop/Spark Developer
Responsibilities:
- Worked on Hadoop cluster and data querying tools Hive to store and retrieve data.
- While developing applications involved in complete Software Development Life Cycle(SDLC).
- Reviewing and managing Hadoop log files by consolidating logs from multiple machines using flume.
- Implemented custom input format and record reader to read XML input efficiently using SAX parser.
- Involved in setting up storm and kafka cluster in AWS environment, monitoring and troubleshooting cluster
- Documented the data flow formApplication > Kafka > Storm > HDFS > Hive tables.
- Involved in converting HiveQL to Sparkql, connecting JDBC drivers between spark and editing configuration parameters.
- Involved in writing Queries in sparksql using scala.
- Loading data from Linux file system to HDFS and vice-versa.
- Using CSVExcelStrorage to parse with different delimiters in PIG.
- Installed and monitored Hadoop and eco system tools on multiple operating systems Ubuntu, CentOS, Suse 11.
- Using sqoop exported analyzed and output data to relational databases using sqoop and visualized it by using Tableau and forwarding it to BI team for report generation.
- Developed Pig Latin scripts to do operations of sorting, joining and filtering enterprise data.
- Implemented test scripts to support test driven development and continues integration.
- Developed multiple MapRedcue jobs in java to clean datasets.
- Worked on Oozie workflow engine to run multiple Map Reduce jobs.
- Filtered datasets by developing custom user defined functions in hive that will be running with the support of MapReduce.
- Supported in setting up QA environment.
- Experienced in working with applications team in installing Hadoop updates, upgrades based on requirement.
Environment: Hadoop Map Reduce 2, HDFS, PIG, Hive, Flume, Eclipse, Java, Sqoop, Kafka, Storm
Confidential, Hagerstown, MD
Hadoop / Spark Developer
Responsibilities:
- having experience onRDDarchitectureandimplementing spark operations on RDD
- Implemented various machine learning models like Random Forest, SVM, K-NN etc using spark MLib component.
- Implementation of de-duplication process to avoid duplicates in daily load.
- Developed MapReduce programs to extract and transform the data sets and results were exported back to RDBMS using Sqoop with HortonWorks.
- Involved in data modeling and sharding and replication strategies in MongoDB.
- Having experience onspark performance tuning Options
- Developed code to write canonical modelJSONrecords from various input sources toKafka Queues.
- Implemented Oozie workflows for Map Reduce, Hive and sqoop actions.
- Analyzed the data by performingHive querieson existing databases and analyze system performance using Hortonworks HDP .
- Worked in integration part of storing data from Rest to MongoDB.
- Experience working with NoSQL database including MongoDB and Hbase.
- Developedstorm boltsandtopologiesinvolvingKafka spoutsto stream data fromKafka.
- Managed small cluster for testing environment with Hortonworks using ambari.
- Transferring queried data to Tableau using JDBC connector to visualize and also using same data to visualize using D3.
- Design and implementation of delta data load systems in Hive, which increased efficiency by more than 60%.
- Coordinating cluster services using zookeeper.
- Exported analyzed data to the relational databases using Sqoop and visualized using Tableau, R
- Involved in development of an application to migrate files from S3 to HDFS.
Environment: Hadoop MR2,HDFS,Spark 0.8.0,Kafka 0.8.1.1, Hive, Zookeeper, Oozie
Confidential, Burlington,MA
Hadoop Developer
Responsibilities:
- Used to perform sentiment analysis of consumers towards product and company.
- To perform web crawl and storing data in HDFS our team designed and implemented Hadoop architecture.
- Developed code base to stream data from sampledata files > Kafka > Kafka Spout >Storm Bolt > HDFS Bolt.
- Setup name node and formatted child nodes to HDFS format using name node.
- Started deploying IBM Inforsphere BigInsights V 3.0 for using with Hadoop ecosystem.
- Scheduled jobs in BigInsights for testing environment using solr in SUSE 11
- Trained interns to use Hadoop and modify cluster according to requirements.
- Configured Hadoop configuration files for master and all machines conf/masters, conf./slaves conf/*-site.xml.
- Scheduling the jobs with workflow engine Oozie, managed actions in both sequentially and parallel using Oozie.
- Optimized code base to run independent tasks in a distributed manner to improve performance.
- Writing map reduce program to process crawled data store in HDFS storage.
- Run different map reduce jobs to analyze data with the help of data science team.
- To perform sentiment analysis, we used storm to get real time data from twitter stream API.
- Manipulate, serialize, model data in multiple forms(JSON,xml).
- Configured, deployed and maintained a single nodeZookeepercluster in DEV environment.
- Configured deployed and maintained multi-nodeDev and TestKafkaClusters.
- Transferring data from MySql to HBase environment using Sqoop.
- Environment Hadoop 0.20, Hbase, MapReduce, Storm, Java, Amazon, EC2,Kafka,Scala, Sqoop
Confidential, Atlanta, GA
Hadoop Developer
Responsibilities:
- Designed docs and specs for the near real time data analytics using Hadoop and HBase.
- Installed Cloudera Manager 3.7 on the clusters.
- Used a 60 node cluster with Cloudera Hadoop distribution on Amazon EC2.
- Developed ad-clicks based data analytics, for keyword analysis and insights.
- Crawled public posts from Facebook and tweets.
- Wrote MapReduce jobs with the Data Science team to analyze this data.
- Converted output to structured data and imported to Spot fire with analytics team.
- Defined problems to look for right data and analyze results to make room for new project.
- TIBCO Spot fire with in-house custom application was used to perform and generate analytics.
Environment: Hadoop 0.20, HBase, HDFS, MapReduce, Java, Spot fire, Cloudera Manager 2, Amazon EC2 classic
Confidential, Raleigh, NC
Application developer J2EE
Responsibilities:
- For user interaction(UI) developer JavaScript behavior code.
- Created database program in SQL server to manipulate data accumulated by internet transactions.
- Developed servlet class to generate dynamic HTML pages.
- Developed back-end classes and servlets using Web Sphere application server.
- Developed an API to write XML documents from a database server.
- Using Junit test tested usability performance for application.
- Maintenance of a Java GUI application using JFC/Swing.
- Created complex SQL and accessed to database using JDBC connectivity.
- Involved in the design and coding of the data capture templates, presentation and component templates.
- As a part of team designed, customized & implemented metadata search and database synchronization.
- Used toad for queries execution and also involved in writing SQL scripts, PL SQL code for procedures ad functions and used Oracle as database.
Environment: Java, JavaScript, Servlets, Web Sphere 3.5,EJB, JDBC, SQL, JUNIT, Eclipse IDE, Apache Tomcat 6
Confidential
Java Application Developer
Responsibilities:
- Creating class and sequence diagram with data flow diagrams and UML.
- Developed use case, sequence, business modeling and data modeling using IBM Rational Rose.
- Developed JSP's with STRUTS custom tags and implemented JavaScript validation of data.
- Using Struts frameworks developed web applications.
- Developed UI using JSP, HTML, CSS, JavaScript.
- According MVC pattern implemented Struts framework.
- Eclipse IDE was used to build application.
- Created validation.xml,Struts-config.xml,web.xml to integrate all components in the struts framework.
- For logging framework used log4j.
- Worked with strut tags and used strut tags as the front end controller to the web application.
- SVN was used to manage application versions.
- Helped in developing user manuals and product documentation.
Environment: Java/J2EE, Oracle 10g, Struts1.2,Hibernate 3, Web Logic 10.0,HTML,AJAX,Java Script,XML,UML,JMS,JDBC,log4j,Web Sphere, IBM Rational Rose, Eclipse 3.4.2 & 3.5