Hadoop Developer/admin Resume
Atlanta, GA
SUMMARY
- 7+Years of experience with emphasis on Big Data Technologies, Development, and Design of Java based enterprise applications.
- Three years of experience in Hadoop Development and four years of Java Application Development.
- Hands on experience in usingHadoopTechnologies such as HDFS, HIVE, PIG, SQOOP, HBASE, Impala, Flume, Spark, Oozie, Mapreduce.
- Implemented in setting up standards and processes for Hadoop based application design and implementation.
- Logical Implementation and interaction with HBase.
- Developed a data pipeline usingKafkaand Strom to store data into HDFS.
- Experience in using Scoop, Zookeeper, and cloud - based computing Manager.
- Services through Zookeeper.
- Experience inNOSQLdatabase HBase.
- Responsible to manage data coming from different sources.
- Worked with different distributions ofHadooplike Hortonworks andCloudera.
- Worked on tuning the performance Pig queries.
- Worked on creating Map Reduce scripts for processing the data.
- Developed Java code that stream the Packet tracer data into Hive using rest full services.
- Experience in maintaining the big data platform using open source technologies such as Spark.
- Experience on ETL development using Kafka, Flume, and Sqoop.
- Involved in performance tuning of spark jobs using Cache and using complete advantage of cluster environment.
- Proficient in performance analysis, monitoring and SQL query tuning using EXPLAIN PLAN, Collect Statistics, Hints and SQL Trace both inTeradataas well as Oracle.
- Loading data from different source (database & files) into Hive usingTalendtool.
- Redesigned the existing InformaticaETLmappings & workflows using Spark python.
- Worked with business teams and created Hive queries for ad hoc access.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Prepared, arranged and tested Splunk search strings and operational strings
- Developed Spark code and Spark-SQL/Streaming for faster testing and processing of data.
- Evaluate and propose new tools and technologies to meet the needs of the organization.
- An excellent team player and self-starter with good communication skills and proven abilities to finish tasks before target deadlines.
- Outstanding communication and presentation skills, willing to learn, adapt to new technologies and third party products.
- Flexible and ability to balance multiple projects Confidential one time in a fast-paced environment.
TECHNICAL SKILLS
Big Data: HDFS, MapReduce, Hive, Pig, HBase, Sqoop, Flume, Oozie, Zookeeper, MongoDB, Spark, Cloudera, Hortonworks, Splunk.
Programming: Java, C/C++, Sas, Java Script, Python, R, PL/SQL, UNIX Shell scripting.
Operating Systems: Windows, Ubuntu, Red Hat Linux, UNIX.
PROFESSIONAL EXPERIENCE
Hadoop Developer/Admin
Confidential, Atlanta, GA
Responsibilities:
- Developing Hive andPythonscripts for data validity checks and updates using Oozie to manage workflows.
- Worked on Data capacity planning and node forecasting.
- Experience in managingHadoopJobs and logs of all the scripts.
- Worked on migrating data from Mongo DB toHadoop.
- Supported the daily/weeklyawsbatches in the Production environment.
- DevelopKafkaproducer and consumers, HBase clients, Spark andHadoopMapReduce jobs along with components on HDFS, Hive.
- Import data using Sqoop to load data from Oracle/SQL Server to HDFS on regular basis.
- Design and implement map reduce jobs to support distributed processing using java, Hive and Apache Pig.
- Worked on Big Data Integration &Analytics based onHadoop, SOLR, Spark, Kafka, Storm and web Methods.
- Processing large data sets in parallel across theHadoopcluster for pre-processing.
- Experienced with batch processing of data sources using Apache Spark,ElasticSearch.
- Use Spark SQL to process the huge amount of structured data.
- ETL processing using Pig and Sqoop and application programming using Hive, Java andpython.
- Executed Map Reduce programs to cleanse data in HDFS gathered from heterogeneous data sources to make it suitable for ingestion into Hive schema for analysis.
- Wrote Spark applications in Scala utilizing the data frame and spark sql api.
- Strong Knowledge on Architecture of Distributed systems and parallel processing, In-depth understanding of MapReduce programming paradigm.
- Installation and administeringSplunkfor monitoring, analyzing and visualizing machine data.
- Working withSplunkdevelopers' team for gettingHadoopdashboards created with various metrics.
- Importing data into HDFS using Sqoop, which includes incremental loading.
- Design and develop MapReduce jobs to process logs and feed Data Warehouse, load Hive tables for analytics and to store daily feed of data on HDFS for other team's use.
- Experience in analyzing Log files for HDFS.
- Used ApacheNififor loading PDF Documents from Microsoft SharePoint to HDFS
- Written Java scripts that execute different MongoDB queries.
- Implemented Data loading using Spark, Storm, Kafka,ElasticSearch.
- Manage and review Hadoop log files.
- Configured deployed and maintained multi-node Dev and TestKafkaClusters.
- Import and export data between the environments like MySQL, HDFS.
- Develop code according to the task assigned for the user story.
- DevelopedSparkscripts by using Scala shell commands as per the requirement.no
- DevelopedSparkcode andSpark-SQL/Streaming for faster testing and processing of data.
- Work with Hadoop designers in troubleshooting map reduce job failures and issues.
Environment: ApacheHadoop2.6.0/Hadoop1.2.0, HDFS, Spark,Map Reduce, Hive,Splunk,Mango DB, HBase, Sqoop, Zookeeper, Oozie, Kerberos, My SQL, Linux, Unix scripts,Python,Putty, Kafka.
Hadoop Developer/Admin
Confidential, Alpharetta, GA
Responsibilities:
- Worked on analyzing, writing Hadoop MapReduce jobs using Pig and Hive.
- Involved in loading data from edge node to HDFS using shell scripting.
- Configured MySQL Database to store Hive metadata.
- Push data as delimited files into HDFS usingTalendBig data studio.
- Very good understanding of Partitions, Bucketing concepts in Hive and designed both Managed and External tables in Hive for optimized performance
- Created Hive tables, loaded data and wrote Hive QL queries to further analyze the data.
- End-to-end performance tuning ofHadoopclusters andHadoopMapReduce routines against very large data sets.
- Performed transformations, cleaning and filtering on imported data using Hive, Map Reduce, and loaded final data into HDFS.
- Design and developData Ingestioncomponent.
- Load the data into Spark RDD and performed in-memory data computation to generate the output response.
- Good knowledge in using apacheNiFito automate the data movement between differentHadoop systems.
- Load and transform data into HDFS from large set of structured data /Oracle/Sql server usingTalendBig data studio.
- Created a self-managedPythonscript to deploy testing of the technologies, and calculate statistics. The script was designed such that future tests on other techs could easily be integrated.
- Designed and developedbigdatasystems to perform variousETLand Map/Reduce jobs using R andLoading data into HBase using Bulk Load and Non-bulk load.
- Responsible for building scalable distributed data solutions using Hadoop.
- Involved in loading data from LINUX file system to HDFS.
- Developed UDFs using JAVA, PIG and HIVE queries.
- Good understanding ofNoSQLdatabases.
- Started using apacheNiFito copy the data from local file system to HDFS.
- Working on Hive/Hbase vs RDBMS, imported data to Hive, HDP created tables, partitions, indexes, views, queries and reports for BI data analysis.
- Wrote the Map Reduce jobs in java to parse the web logs, which are stored in HDFS and used MRUnit to test and debug MapReduce programs.
- Worked on tuning the performance Pig queries.
- Very good experience with both MapReduce 1 (Job Tracker) and MapReduce 2 (YARN).
- Involved in loading data from UNIX file system to HDFS.
- Written Map Reduce code to process and parsing the data from various sources and storing parsed data into HBase and Hive using HBase-Hive Integration.
- Load and transform large sets of structured, semi structured, and unstructured data
- Experience in managing and reviewing Hadoop log files.
- Job management using Fair scheduler.
Environment: Red hat, ApacheHadoop2.6.0/Hadoop1.2.0, ETL, Map Reduce, Hive, HBase, Pig, SqoopZookeeper, Oozie, Python, Horton works, My SQL, Unix, Linux, Winscp, Yarn, Talend.
Hadoop Developer/Admin
Confidential, Sunnyvale, CA
Responsibilities:
- InstalledHadoop, Map Reduce, HDFS, AWS and developed multiple MapReduce jobs in PIG and Hive for data cleaning and pre-processing.
- Involved in schedulingTeradataand UNIX objects to run the jobs on daily/weekly basis depending on business requirement.
- Experience in ingesting data intoCassandraand consuming the ingested data fromCassandratoHadoop.
- Upgraded the Cloudera Hadoop ecosystems in the cluster using Cloudera distribution packages.
- Automated second layer of MAPRFS backup process toAzureCIFS.
- Developed PIG Latin scripts to extract the data from the web server output files to load into HDFS.
- Assisted in upgrading, configuration and maintenance of variousHadoopinfrastructures like Pig, Hive, and HBase.
- Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and aggregation and how does it translate to MapReduce jobs.
- Worked on Database designing, Stored Procedures, and oracle.
- Created concurrent access for hive tables with shared/exclusive locks enabled by implementing Zookeeper in cluster.
- Participated in knowledge transfer sessions to Production support team on business rules,Teradataobjects and on scheduling jobs.
- Experience in NoSQL databases such as HBase andCassandra.
- Involved in debugging Map Reduce job using MR Unit framework and optimizing Map Reduce.
- Developed Hive Scripts, Pig scripts, Unix Shell scripts, programming for all ETL loading processes and converting the files into parquet in theHadoopFile System.
- Responsible for gathering requirements, process workflow, data modelling, architecture and design and led application development using Scrum.
- ImplementedClouderaManager on existing cluster.
- Generated Java APIs for retrieval and analysis on No-SQL database such as HBase andCassandra.
- Created scripts for importing data into HDFS/Hive using Sqoop from DB2.
- Loading data into HBase using Bulk Load and Non-bulk load.
- Extensively worked withClouderaDistributionHadoop.
- Extensively worked on Installation and configuration of Cloudera distribution for Hadoop(CDH).
- Exported the analyzed data to relational databases using Sqoop for visualization and to generate reports.
Environment: Hadoop, Map Reduce, Hive, HBase, Pig, Sqoop, Zookeeper, Oozie,, Linux,SQL,Cloudera, Cassandra, Teradata.
Java developer
Confidential
Responsibilities:
- Architected a JSF, Web sphere, Oracle, spring, and Hibernate based 24x7 Web application.
- Built an end to end vertical slice for a JEE based billing application using popular frameworks like Spring, Hibernate, JSF, Facelets, XHTML, Maven2, and Ajax by applying OO design concepts, JEE & GoF design patterns, and best practices.
- Integrated other sub-systems like loans application, equity markets online application system, and documentation system with the structured products application through JMS,Web Sphere MQ, SOAP based Web services, and XML.
- Designed the logical and physical data model, generated DDL scripts, and wrote DML scripts for Oracle 9i database.
- Tuned SQL statements, hibernate mapping, and Web Sphere application server to improve performance, and consequently met the SLAs.
- Gathered business requirements and wrote functional specifications and detailed design documents.
- Improved the build process by migrating it from Ant to Maven2.
- Built and deployed Java applications into multiple Unix based environments and produced both unit and functional test results along with release notes.
Environment: Java 1.5, JSF Sun RI, Facelets, Ajax4JSF, Richfaces, Spring, XML, XSL,XSD, XHTML,Hibernate Oracle 9i, PL/SQL, MINA, Spring-ws, SOAP Web service, Web,Sphere, JMX, ANT, Maven2Continuum, JUnit, SVN, TDD, and XP.
Java developer
Confidential
Responsibilities:
- Designed and developed UI using Struts view tags (HTML, Bean, Logic and Nested), JSP, HTML, CSS and Struts Tiles.
- Configured Struts, Spring, hibernate for the development environment.
- Designed and Developed parsers classes using SAX Parsers.
- Involved in Developing SOAP messages as part of Web Services testing.
- Developed XSD schemas.
- Developed XML beans using XSD schemas.
- Used Web Logic Workshop for development Environment.
- Configured Data Source in Web Logic Server.
- Used Subversion as Source code control.
- Developed Ant scripts for generating the XML Beans.
- Generated DAO, POJO classes and Hibernate mapping files using Hibernate tools.
- Modified DAO classes and Hibernate mapping files as per the application standard.
- Used log4j, JUnit for logging and testing the application.
Environment: Xml, xml beans, sax parsing, http sessions, Eclipse, Web logic Workshop, SOAP messages, Oracle 10g, XSD, ANT Scripts.
Software Developer
Confidential
Responsibilities:
- Designed architecture, requirements specifications, use case diagrams and sequence diagrams using UML.
- Developed Java based GUI for storing temperature data.
- Created code to connect VME with PC via RS-232 cable.
- Wrote code to reboot VxWorks, start acquisition of data from IP modules and store it into text files.
- Wrote code for calculating minimum, maximum and average measurements from 94 sensors. Based on the temperature, data was sent to cryogenics to cool overheated magnets using C++.
- Wrote code to show online graph in Java Swings and Graphics.
- Wrote code to show offline graph based on user requirements.
- Wrote code for getting channel data through Lab View using socket connection.
- Documented for the project.
Environment: VME (Versa Modulo Europa), IP- modules, VxWorks/Tornado RTOS, J2EE, C++, Socket programming, Core Java, Multithreading, Swing, UML, HTML, JDBC, TCP/IP, Tornado.
