Tech Lead Resume
SUMMARY
- Over 11 years of IT experience with 4 years of experience in Big Data & Hadoop using Hive/Sqoop/Hbase/Pig/Python/Java/AVRO/Hadoop streaming with Map/Red.
- 2 years of experience in Spark with SCALA, Python pyspark & Spark SQL.
- Built automation frameworks using Python, VB scripts & shell scripts.
- Worked in Data ingestion using SQOOP from various sources like Netezza, DB2 etc into HDFS
- Good hands on the Big data - Analytics.
- Hands-on Big DataArchitectexperience and Involved in creating project design, Implementation plan and timeline etc.
- Very good understanding of Hadoop Architecture and Map/Reduce concepts.
- Extensively worked in HIVE with partitioning/bucketing/performance tuning.
- Experience in MapR & IBM Biginsights in developing Big Data projects.
- Handled Social Media projects to build/develop the Bigdata platforms for the customers to create actionable intelligence.
- Experience in developing customized UDF's in java to extend Hive and Pig Latin functionality.
- Experience in writing map reduce programs in Python.
- Experience in Data import/export Data Ingestion using Sqoop into HDFS/GPFS.
- Experience with Hbase (NoSQL) on Table design/Data Load etc.
- Strong Knowledge in SAS BI Tools, SAS Base/macro programming, R Stat, Python &Shell script.
- Strong knowledge in SQL and Data modeling concepts.
- Created conceptual, logical and physical\relational models for OLTP and OLAP applications and E/R diagrams, normalization and de-normalization.
- Experienced in creation of Data warehouse/Marts and Dimensional Models using Star Schema and Snowflakes Schema.
- SAS ETL/Report development/implementation and job scheduling.
- Working experience in SAS & R stat Production, Support & maintenance project with Unix/Windows/ZOS environment.
- Development and Implementation exposure to SAS Fraud Analytics applications (SAS SNA).
- Exclusively worked in SAS Grid computing environment with Linux 64 bit architecture.
TECHNICAL SKILLS
Big Data Ecosystems: Hadoop, SPARK, MapReduce, GPFS/HDFS, HBase, IBM Symphony, Hive, Pig, Sqoop & JSON
Data Ingestion: SQOOP and Automated Ingestion frameworks
Analytics Tools: SAS, R & Tableau
Programming Languages: Python, Java
Scripting Languages: Bash & VB Script
Databases: NoSQL, Netezza
Tools: Eclipse, SAS Enterprise Guide, SAS DI Studio
Platforms: Windows 2008/2012, Linux & ZOS(Mainframe)
Methodologies: Agile, RTC, Git
PROFESSIONAL EXPERIENCE
Confidential
Tech Lead
Responsibilities:
- Architecture and design of code/data migration into Bigdata.
- Source to target mapping and technical documentation.
- Data governance and security policies in Bigdata.
- Design, implement and maintain data ingestion for the Analytical tables of Netezza.
- Code conversion from SAS to hive, pig or Spark as per technical requirement.
- Conversion of Analytical and Scoring models using Java/Python as Spark/Hadoop streaming jobs.
- Studying optimal way to converting SAS Model using Spark.
- Follow agile framework to communicate and distribute the tasks to the team.
- Data validation of ingested tables and business sign off.
- Data requirement analysis with third party vendor and documentation.
- Architecture design and Project setup.
- Extensively used Python as programming layer for the HADOOP Tasks.
- Imported the Enterprise data (Netezza) using SQOOP into HIVE to have joins with external sensor data.
- Metadata analysis, Data Volume estimation and forecast for Map/Red design.
- Re-engineering the existing map/red jobs with Spark using Python.
- Extensive experience in creating and configuring Hbase table.
- Establish data quality and validations using PIG.
- Written pig scripts to extract, transform and load data into Hbase tables. Test and optimize the Map/Red jobs using Hadoop Streaming JAR. Currently re-engineering the portion of the project using SPARK
- Meet with business team and gather the Requirement
- Create Analysis and design documentation.
- Create python/Shell scripts to automate Sqoop to download data from Netezza.
- Automate/handle metadata changes from Netezza tables and accommodate into HIVE by building AVRO storage and AVRO Schemas.
- Post development activities such as move the scripts into production and schedule jobs.
- Maintenance and support.
- Analyze sample CSV data sources from Vendor.
- Architecture design and planning the optimization techniques.
- Analyze data model/table metadata from Gamification DB for member identification.
- Built Hive external and managed tables to store historical data.
- Developed Pig script to analyze and process the data stored in hive.
- Developed complex Hive queries to join multiple tables.
- Analyzed large data sets using hive queries and Pig Scripts.
- Used Hadoop streaming jar for parallel processing of XML files and store it to Hadoop.
- Have extensive on setting the map reduce configurations through Hadoop streaming jar, Pig and Hive.
- Manipulated huge files by bringing into the current working directory where map reduce program runs
- Extensive experience in writing python scripts for data integrity check and validations,
- Written user defined functions in java and used them in Pig Latin scripts.
- Production Project setup & Build Control-M jobs.
Environment: RedHat Linux6, Spark/Hadoop Streaming, MapRed, Netezza, Hbase, SCALA, Python, Shell Script, PIG
Confidential
SAS/Data Architect
Responsibilities:
- Participating in Data modeling & Design as a downstream team.
- SAS jobs performance tuning optimization.
- Create SAS Analytical reports for business users to quickly make decisions.
- Historical Data migration to Hadoop as Hive tables.
- Identifying trusted source of data for critical Analytical Process.
- Pilot study of new Analytical tools such as R, SPSS
- Pilot Study of Reporting/Analytical tools built on top of Hadoop and use Tableau, Lumera etc
- SAS platform Administration.
- Security implementation and risk analysis.
- Linux administration.
- R server administration.
- User, Group & Role management.
- Trouble shooting server/client issues.
- Automation using shell scripting, Base SAS R programming.
- Client tools packaging and installation.
- EG and EM remote application management.
Environment: RedHat Linux6, Windows 2008, Base SAS,SAS Management console 9.2, R Studio, Netezza, Hive, Shell Script, Python, Base SAS & R programming.